• No results found

The effect of question order on performance

N/A
N/A
Protected

Academic year: 2021

Share "The effect of question order on performance"

Copied!
52
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

1

The effect of question order on performance

and evaluations of performance

Ana Beatriz Quintero Guerra

University of Amsterdam

July, 2016

Abstract: This study investigates how question order in terms of difficulty may influence

not only final results, but also the perceptions individuals have about their performance in testing situations. The results of an economic experiment confirm the presence of a question order bias affecting impressions of ability and difficulty: subjects that have easy items at the beginning expected a lower overall performance and perceived the test as more difficult. Question order also affects future choices: participants starting with easier questions choose easier tasks more often compared to those that had hard questions at the beginning. The impact of feedback in the form of intermediate scores was also examined but no significant differences were found between the feedback conditions.

Keywords: self-assessment, question order, feedback

MSc. Business Economics, track Managerial Economics and Strategy Master thesis

(2)

2 Statement of Originality

This document is written by Student Ana Beatriz Quintero Guerra who declares to take full responsibility for the contents of this document.

I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it.

The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

3

Table of contents

1 Introduction

2 Literature review

2.1. Self-assessment of abilities

2.2. Consequences of self-assessment

2.3. Difficulty, question order and evaluations of performance

2.4. Question order and choices

2.5 Feedback and self-assessment

3 Experimental design

3.1. Sample description

3.2. Treatments and performance measurements

3.3. Tasks and incentive scheme

4 Hypotheses

5 Results

5.1 Question order and performance

5.2 Perceptions of ability

5.3 Perceptions of task difficulty

5.4 Motivation and enjoyment

5.5 Levels of conformity

5.6 Task choice

5.7 Feedback

6 Discussion

7 Conclusions

References

Appendix

(4)

4

1. Introduction

Self-assessment is a crucial component of an individual´s daily life. Whether we can accomplish our goals, pursue our dreams and improve our chances for the future depends on the perceptions we have about ourselves and our environment. Performance appraisals have become common features of the workplace over the last two decades and therefore are extremely important for career development. It gives both employers and employees the opportunity to assess their contributions to the organization and increase productivity (Prewitt, 2007). Also, there is evidence indicating that there might be gender differences in self-assessment practices. Research by the company Hewlett-Packard in 2004 shows that women were found to apply for a promotion when they met 100% of the qualifications listed for the job, while men went for it meeting 60% of the job requirements: overqualified and over-prepared women still hold back because they think they are not good enough. In addition to career development, self-perceptions are of great importance in educational progress. Through self-assessment students can identify their learning process, promote responsibility and independence and encourage goal setting in examinations. Accurate self-monitoring can also lead to better performance in examinations.

The present study examines how question order may affect performance and perceptions of performance. DeMoranville et al. (2008) investigate how question order affects customer satisfaction and service quality measures in a survey. The authors find that presenting general questions followed by specific ones (and not in the reverse order) is the most optimum questionnaire format in terms of predictive accuracy. Past studies have investigated the effect of question order in survey responses in various domains (McFarland, 1981; Moore, 2002 and Auspurg and Jackle, 2015). These studies conclude the order in which questions are put is related to people´s judgements of what is being asked. In a different setting, Michael and Garry (2015) study how ordered question may affect jurors and eyewitnesses in the courtroom. The authorsshowed that judges were more confident in eyewitness testimonies when easy questions were asked before harder ones. In this paper, it is investigated how starting with different levels of difficulty may influence final results but also evaluations of performance on a test. In an economic experiment, participants are asked to complete a set of tasks that are arranged in a specific order: easy to difficult, or difficult to easy. Besides looking at the relation of scores and

(5)

5

question order, participants are asked to evaluate their progress during the test with the purpose of measuring their perceptions of performance.

A possible explanation for different performances in question orders is that people may be imperfectly assessing their abilities. Because they are uncertain about their performance, they might underestimate or overestimate themselves and this may influence their behavior. A way of mitigating this uncertainty is by providing information to subjects about their performance. If the individual knows his actual performance, differences between question order conditions may be reduced. In order to check the latter, this study examines the influence of feedback evaluations on performance. In today´s complex business environment firms frequently provide performance feedback in order to encourage employees to attend all their tasks (Christ et al., 2016). However, some studies have found that feedback may not always be a motivating tool for companies to use (Kuhnen and Tymula, 2012). This paper then attempts to investigate whether information on the progress of a task improve or deteriorate performance.

Varying the order of questions in terms of difficulty and its effect on self-assessments has been examined by authors such as Weinstein and Roedinger (2010, 2012) and Jackson and Greene (2014). In a laboratory setting, the authors use a general knowledge test with questions ordered in ascending and descending difficulty to analyze potential differences in performance and judgements of performance. However, there are some limitations to the past studies investigating these research area. First, there is lack in variation of personal background, as most of participants are approximately the same age, educational background and nationality. Furthermore, past research omits one of the most important mediators between task difficulty and performance: motivation. Conformity levels depending on question ordering by difficulty have also received little attention in the literature.

This paper enriches the literature on the effect of question order arrangement and evaluations of performance in a test. As a novel aspect, this paper investigates how question order arrangement affects future choices. Although there is no previous literature about it, question order may have an impact on decision-making processes. When people have different perceptions of their abilities, they may have a preference for tasks with different levels of difficulty. This is of great importance in various domains, since having the feeling of a bad performance may lead to an incorrect decision. Hackett and Betz (1995) and Correl (2001) point out the importance of self-assessment in career choices, especially among women. They found that women tend to avoid scientific careers because

(6)

6

they underestimate their capabilities. Inappropriate choices might be made due to uncertainty in past performance. Because making a wrong decision is costly to them, providing feedback may lead individuals to choose differently and it may as well increase the accuracy of these decisions. Moreover, this study highlights the role of motivation as a factor that may change given the type of question difficulty sequence, which has barely been examined before. It also looks at how conformity levels change depending on the level of difficulty of the questions subjects perform first. Lastly, the experiment constructed in this research is distributed online through social media and e-mails. Online access increases diversity in the respondents, meaning that the results of this research can be extended to different audiences.

The remainder of this paper is as follows. Section 2 provides a literature review about the research areas investigated in this study. Section 3 describes the experimental design and offers a sample description. In Section 4 the theoretical expectations are presented. Section 5 reports the results obtained by means of statistical analysis. In section 6 the implications of the results obtained are discussed. Finally, conclusions are presented.

2. Literature review

In this section, past research related to this study will be revised. Since this paper combines different areas, the literature review will be divided in five subsections. First, the topic of self-assessment is discussed. Psychology shows that people have imperfect knowledge of their ability, a fact that influences individuals´ decisions and actions. The consequences of under- or overestimating our abilities are explained in the second subsection. Subsequently, past studies about difficulty as a determinant of performance, the effect of question order on performance and evaluations of performance are discussed. The fourth subsection reviews the relation between question order based on level of difficulty and future choices. Lastly, this literature review looks into previous research about the impact of feedback on performance, as well as on self-assessment measures.

2.1 Self-assessment of abilities

To understand this study, it is important that the reader is familiar with the literature on people´s views about their own abilities and their knowledge. The psychology literature shows that people have imperfect knowledge of their abilities. Individuals may have

(7)

7

aspirations in life, such as learning a new language, obtaining a perfect score in test or losing weight in a short time, but they are uncertain about their capabilities to reach their goals. It is a process that takes time, as people learn about themselves and get to know how far they can go.

Imperfect knowledge of abilities has been documented by several authors. Zuckerman (1979) highlights the presence of self-serving biases, i.e. humans blame external factors for their failures, but attribute success to internal characteristics. Weinstein (1980) explains how people rate their own chances to be above average for favorable future events, while their rating is below the mean for unfavorable ones. An article by Kruger and Dunning (1999) looks at how people misjudge their abilities across various knowledge domains. Their result, known as the Dunning-Kruger effect, shows that those with low ability inflated their expected performance, while top performers tended to underestimate their abilities compared with their peers. Also, the authors found that those with high skills mistakenly assume that easy tasks for them are also easy for others, which confirms the underestimation of their competence. Kruger (1999) shows that when asked to evaluate their performance at easy tasks (such as riding a bicycle and using a mouse), individuals tend to overestimate themselves. However, in the case of tasks that involve lower absolute skills (such as computer programming and playing chess), they underestimate their abilities.

2.2 Consequences of self-assessment

People´s perceptions of their abilities may have substantial consequences for their choices. The first step to accomplishing a task is to believe that one is capable of doing it, in both social and intellectual domains. A study by Ehrlinger and Dunning (2003) examines women´s notions about their own ability and confidence, concluding that they regularly underestimate themselves when compared to men in what respects scientific reasoning ability. By undervaluing their abilities, women are more likely to turn down opportunities in the sciences and avoid scientific careers. Li et al. (2007) examined the influence of self-perceptions of ability in the performance of physical activities in a group of children. They find that those with higher self-perceptions of ability were able to perform better, while a low level of self-assessment negatively influenced the way they completed the task.

(8)

8

Finally, authors such Dominguez-Martinez and Swank (2009) and Fang and Moscarini (2005) have developed models that underline the importance of self-appraisals for the firm and its members. A study by Santos-Pinto and Sobel (2005) suggest that people can have a positive self-image of themselves, but also a negative self-image. The results of their model shows that when a task gets easier, the individual a more positive self-perception of himself.

2.3 Difficulty, question order and evaluation of performance

Hence, the view a person has of himself influences his actions, decisions and achievements. However, performance also depends on how difficult the task appears to the individual. An individual that is able to correctly determine the difficulty of a task develops a realistic sense of his abilities, and will be more willing and prepared to take on challenges in both in social situations and the work environment.

A substantial body of evidence suggests that task difficulty influences the level of effort put, predictions of success or failure, enjoyment of the task and stress and anxiety. Huber (1985) shows that when a task was difficult, the setting of a difficult goal led to significantly lower performance among students as compared to a difficult goal when the task was easy. Dominguez-Martinez (2009) theoretically examines the influence of difficulty of a task allocated by a manager to a junior employee. The employee can learn about his ability from experience, since past performance on a task contains information about the junior´s ability. However, in the model, the junior does not know the difficulty of the performed task. This means that the employee is not able to perfectly determine his ability from past performance. The results show that a manager benefits from initially assigning a too easy task to employees that are relatively talented, even though they are able to perform a more difficult task. The reason behind this decision is that assigning a difficult job may discourage the employee, which will make him underestimate his abilities, lower his self-perception and misjudge the contribution he can make to the organization. Assigning an easy task will therefore enhance motivation in the junior-employee, as he will have a more optimistic perception about his performance. Li et al. (2007) also look at the role of perception of task difficulty in relation to perceptions of ability. In a physical education class, the authors found that children who viewed a task as less hard were likely to feel more capable and obtained better performance scores. In addition to this result, the study also confirms a negative relation between task difficulty

(9)

9

and intrinsic value, children that perceived the task as easier were more motivated and enjoyed the task more than those that perceived it as more difficult.

Neuropsychology has also documented the influence of different levels of difficulty on performance. A study by Philiastides et al. (2006) investigates neural activity during a decision-making experiment with various difficulties. Their results show the presence of a neural component that rises when difficulty increases. Light and Obrist (1983) used time measured tasks to show that an increase in task difficulty results in a higher heart rate, higher cardiac response and higher arterial blood pressure. These strong neuropsychological responses may influence test performance. The Yerkes-Dodson law describes a curvilinear relation between anxiety and performance, in which too much or too little nervousness negatively affects results. Moreover, Morris and Libert (1970) found that worry and emotionality resulting from test anxiety deteriorates performance.

Although many authors looked into level of difficulty and performance, literature is not always consistent in its findings. While some of the above mentioned publishers suggest that a more difficult task implies deteriorated results, others, such as Arkes (1979), have found that students under a difficult questions condition are more interested and more motivated and therefore obtained higher scores in their tests.

Several researchers have focused on the effect of question order on testing situations. Kennedy and Butler (2013) examine whether the order of the question makes a difference as to how students perform in a mathematics test, by looking at academic results from multiple semesters with several different classes. The authors found that the ordering of the questions did not impact test performance. This is also the case in studies by Pettijohn and Sacco (2007) and Weinstein and Roedinger (2010), were there was no statistically significant differences between the scores on the different student exams versions. In contrast, other scholars such as Towle and Merril (1975) and Balch (1989) do acknowledge the influence of question order on performance. Yechiam et al. (2004) investigate the design of training programs that start with easy strategies and move on to more difficult ones. Their results show that starting with an easy strategy results in better performance and less dropout rates.

The majority of past studies focus on the impact of question order on performance. However, little attention has been given to the effect of question order on evaluations of performance. Evaluations of performance are a key component of an individual´s educational progress, but it has become increasingly important for career development. Performance appraisals, have become common features of the workplace over the last

(10)

10

two decades (Prewitt, 2007). It gives both employers and employees the opportunity to assess their contributions to the organization, increase productivity in the work team and adjust effort levels in order to reach efficient outcomes. Effective performance judgement systems can also motivate the staff to do their best for themselves, help personnel in setting achievement goals for the next period and defining proximate challenges. In addition, performance management systems improve the allocation of an organization's resources in order to achieve highest possible performance.

Although there is insufficient research regarding the impact of question ordering on evaluations of performance, some previous work in relation to this topic can be cited. Dean (1973) presented different exam versions with pre-established difficulty levels, student perception of the test, course and instructor were compared. The results show that ordering questions by difficulty influenced judgements of test fairness, teacher evaluations, worthwhileness of the test and understanding of the material. Pettijohn and Sacco (2007) examined perceived test difficulty, test anxiety and comprehension in a test varying the order of questions by complexity. Although anxiety did not seem to change the exam version given, perception of task difficulty and understanding of the material were influenced by type of question arrangement. Towle and Merril (1975) find higher levels of excitement during a test arranged hard-to-easy compared to the version that was progressing in difficulty. Weinstein and Roedinger (2012) constructed an experiment with general knowledge questions and varied the arrangement of question based on difficulty. Using a sample consisting of undergraduate students, the authors found that enjoyment rating decreased as the test became more difficult, and found the test easier on average in the hard-to-easy sequence.

Participants finding the hard-to-easy test easier on average is explained by a behavioral bias known as the anchoring effect. Anchoring has been extensively discussed in behavioral economics research by authors such as Kahneman and Tversky (1974). People rarely perceive or judge an event in isolation, instead they set reference points that usually influence their decisions. Strack et al. (1988) presents an interesting example that illustrates that answers to questions can depend heavily on the context of the questions asked earlier. In an experiment, a group of students is asked two questions in the following order? 1. How happy are you with your general life? 2. How many dates did you have in the last month? The results showed no correlation between the answer of this questions. However, when the order of the questions was reversed the authors found high correlation

(11)

11

between the answers. Asking first about their dating lives influenced the focus of the second question.

2.4 Question order and choices

In addition, this paper examines whether different question orders based on level of difficulty influence future choices. In achievement situations, the purpose of an individual is to demonstrate high ability and to avoid demonstrating low ability (Nicholls, 1984). This means that, given the possibility to choose, individuals should rationally select those tasks they expect maximize their chances of demonstrating high ability and avoid those tasks which reflect their low skills. Such behavior is related to self-assessment of abilities: a person that knows what is capable of will choose for the most challenging tasks, since that is the way of showing what they can really do. In contrast, those who have a low self-image will refrain from taking the difficult path because they underestimate their capabilities. There are no previous studies looking at the impact of question ordering on the level of difficulty chosen for future tasks. However, some previous research by Hackett and Betz (1995) and Correl (2001) have highlighted the importance of self-assessment in career choices, especially among women. They found that women tend to avoid scientific careers because they underestimate their capabilities.

A possible explanation for making inaccurate choices is the lack of information about actual performance. Because the individual is uncertain about his abilities he may overestimate or underestimate himself. Providing feedback may eliminate this uncertainty and may lead to more accurate choices. Consequently, informing individuals about their progress may then influence their decisions, as well as their performance.

2.5 Feedback and self-assessment

Feedback can influence an individual´s self-assessment. In education and sports feedback is employed to improve students and athletes ‘performance. As part of the process of performance appraisals, many companies give feedback to their staff about their progress at work. There are some authors that suggest that providing feedback on performance may harm outcomes in organizations, as it can negatively impact the self-esteem of the person (Kluger and DeNisi, 1996 and Smither et al. (2005)). However, there is considerable empirical evidence for feedback having a positive effect on performance. Kuhnen and Tymula (2012) addresses how feedback and self-esteem considerations

(12)

12

interact and influence employee performance. They show that when feedback was provided, people worked harder relative to cases when feedback was not given to them. Their second result shows that the impact of feedback varies depending on self-esteem: when individuals performed better than expected (high self-esteem) they decreased output in the next period but worked harder over time; those that performed worse than expected increased output in the short-term but did not improve their overall productivity. Their findings suggest that feedback can be strategically used to improve employee performance in organizations, depending on their actual and perceived abilities. Karl et al. (1993) looked at the impact of feedback on self-efficacy and performance during a training in a speed reading class. In a group of forty children, the authors found that those that received effort feedback exhibited greater levels of task persistence than those given no feedback. Escartí and Guzmán (1999) investigated the effect of positive and negative feedback on performance and task choice in an athletic activity. They learned that players that had received a positive reviews showed higher self-efficacy and performed the task better, respective to athletes that got negative comments. Moreover, when participants were presented with three different tasks with ascending levels of difficulty, those that received positive feedback and had a high self-image chose to perform more difficult tasks. These previous result imply that reporting feedback to individuals can be used as motivational variable in several domains to improve performance.

According to Erez (1977), knowledge of scores has been found to be a necessary condition for goals to affect performance as “it facilitates the display of individual differences in self-goals on the basis of knowledge of individual performance.” The study involved solving a computation task in two parts. Findings indicated that in the second part of the experiment, after subjects are asked to set their goals for the next task, the level of performance attained was higher when subjects were given feedback and the targets set were more difficult, compared to subjects that did not receive any information about their results in the first part. In relation to the absolute need of feedback, an interesting paper by Weber (2000) approaches whether convergence towards equilibrium behavior can occur in games played repeatedly without any feedback between periods. In an experiment designed with four treatments, participants were divided into three no-feedback and one no-feedback condition. Although convergence to equilibrium was greater in the condition in which subjects received feedback, it also took place in all other treatments. These results show that feedback is an important mediator for learning, but that its presence might not be critical for individuals to improve performance.

(13)

13

3. Experimental design

This section explains the method for the collection of the data that was used in this paper. It also contains a description of the participants in the sample, the different treatment conditions and the experimental procedure employed. This experiment is constructed with the online survey-software Qualtrics. It is distributed to participants through social media platforms, university e-mails and blogs. The experiment was available for completion for a period of two weeks of May of the present year.

3.1.

Tasks and incentive scheme

Participants have to complete seven tasks. Each correct problem solved earns them 1 point and one task has a value of 10 points, meaning that maximum number points that can be obtained in the test is 70. Other studies have employed questions about general knowledge. However, as the experiment is online, the answer to general knowledge questions can be easily found on the Internet and this would invalidate all results. To avoid this issue, this research focuses on mathematical problems that have to be solved in a limited amount of time. The main reason for a time restriction is to increase the variation in test score. If there is no time limit, participants will most likely find the correct answers to all the problems and final test results will be very similar. Another reason for restricting time is that participants may get distracted or bored if all questions take very long to complete. Giving a fixed time to complete the task ensures the focus of the participant. All in all, in each task participants have one minute to solve as many problems possible. Including the judgements of performance questions and a questionnaire presented at the end, the duration of the experiment is approximately 10 to 15 minutes.

There are three types of tasks participants solve. The first type consists of adding up sequences of 5 two-digit numbers (difficult) or 2 two-digit numbers (easy), depending on the difficulty of the question. In this task, participants could type their answer into a textbox. The second task involves mathematical sequences of five numbers that can be easy to complete (such as 22, 33, 44, 55, and 66) and others in which the next number is more difficult to determine (such as 5, 20, 50, 110 and 230: the first number is added 5 and then multiplied by 2). In a last task, participants should find the missing number X in an equation of 2 two-digit numbers in case the task is easy, and 5 two-digit numbers in

(14)

14

case the task is difficult. For the last two types described above, participants could select one correct answer out of three options given in the format of a multiple-choice question.

Figure 1 presents a summary of the experiment. The first 3 tasks are either easy or difficult, depending on the treatment, E-D or D-E. The same holds for the last 3 tasks. Participants make judgements about their performance and perceptions after every three questions, thus twice during the completion of the test. In the case that they are in the feedback condition, their score until that point will be shown to participants before continuing to the following tasks. After finishing task 6, subjects must again evaluate their performance, and after that the score of the six questions answered will be shown if they are on the feedback condition. Task 7 offers the subject the possibility to choose the task they want to complete: A or B. Task A consists of adding up sequences of 5 two-digit numbers and each correct problem has a value of 1 point. As this task is more difficult but a correct answer is more valuable, it should only be attractive for those with high ability. If participants choose Task B, they would add up sequences of 2 two-digit numbers, an easier task that gives 0.5 points per each correct answer. When all tasks are completed, the final score is presented to all participants, independent of the treatment they belong to. Lastly, participants are asked to fill in questionnaire about gender, age, major of studies, previous knowledge about the research and math grade in high school. A summary of the tasks in each treatments and the instructions of the experiment can be found in Appendix A and F respectively.

Figure 1

Timeline of the experiment

To incentivize participants, three subjects were randomly selected for payout. If participants want to be eligible for the prize, they are asked to write their e-mail address as way of future contact with the experimenter. An incentive scheme based on performance was implemented. Individual performance will be compared to that of other participants. If the individual performs better than 75% of his/her fellow participants,

(15)

15

there is a chance to win a 15 euro gift card from web shop bol.com. If performance ranges from the 75th to the 50th percentile, there is a chance to win a 10 euro gift card. Performing as well as 50% to 25% of the others means a chance to win a 5 euro gift card. If the individual performs worse than 25% of fellow participants he/she will not receive any reward. From each performance group, one participant was randomly selected to win a gift card with its correspondent value.

3.2.

Treatments and performance measures

This experiment uses a between-subject design and is structured in four different treatments. The first two treatments relate to the question order. Treatment E-D starts with easier task and progresses to more difficult ones. The second treatment has the difficult questions at the beginning and finishes with tasks that are easier to complete (D-E). In the third and the fourth treatment, the manipulated variables are the display of information relating to intermediate score of the participant and the different order of the questions. There is a feedback condition and a no-feedback condition. Participants are randomly assigned to the treatments.

Also, several measures indicating evaluation of performance were analyzed between the easy-to-difficult and difficult-to-easy conditions. Similar evaluation of performance measures have been used by Weinstein and Roedinger (2010, 2012) and Li et al. (2007). Participants answer all questions by using a Likert scale ranging from 1 to 5. The first measure is self-perception of ability. Subjects are asked to answer the question “how good do you think you are doing in this test?” by selecting options with extremes labeled bad (1) and excellent (5). The second evaluation of performance relates to perceptions of task difficulty (“How difficult are you finding this test?”), with choices ranging from very easy (1) to very difficult (5). Moreover, questions such as “how much are you enjoying this test?” and “how motivated are you to continue with this test?” are presented to the participant, with possible answers that go from not a bit (1) to extremely (5). Performance expectancy is tested in the form of the question “What do you think will be your overall score?”. A final question is asked to measure conformity levels compared between the two order conditions: “with how many questions will you be satisfied at the end of this test?”

(16)

16

3.3. Sample description

Table 1 presents a description of the data in terms of responses, gender, educational background and age of participants. The distribution method of this experiment allows for a great variation in the characteristics of the subjects. This implies that the results obtained in this study can be generalized to more audiences. There are 106 responses in total, from which 92 are fully completed and 14 are partially completed.

Table 1

This table presents a description of the data in terms of responses, gender, educational background, age and high school math grade of participants. In parentheses are percentages

outcomes from the sample. All percentages are rounded to integer numbers. Part A shows finished and partial responses in numbers and percentages. Part B shows background

characteristics for all completed responses. Part A Responses Finished responses 92 (87) Partial responses 14 (13) Total responses 106 Part B Gender Educational

background Years of age Math grade

Male 37 (40) Economics 27 (30) 18-30 51 (55) Lower than 6.0 5 (5)

Female 55 (60) Politics and Law 6 (6) 31-45 11 (12) 6.0 - 7.0 27 (29)

Total 92 Applied Sciences 14 (15) 46-60 18 (20) 7.0- 8.0 1 (1)

Medicine 1 (1) >60 12 (13) 8.0 - 9.0 38 (41) Biology 6 (6) Total 92 Higher than 9.0 21(23) Other 38 (41) Total 92 Total 92

In a partially completed response, the participant did not reach the end of the experiment. Therefore, there is no information available about background characteristics of the participant. However, it is possible to know to which question order and feedback condition the participant was assigned. This an advantage of this experimental setting as compared to a laboratory experiment. Because only completed responses are useful for

(17)

17

the objectives of this study, the 14 partial answers are not taken into account in the results. A summary of the number of participants per treatment condition from completed and partial responses are presented in Table 2. The reader will notice a large difference between the number of participants in treatments E-D and D-E. This difference is mainly the result of randomization. It also be explained by a difference in dropout rates: out of 14 incomplete responses, 8 corresponded to the D-E treatment and 6 to E-D condition. A detailed description of participants´ background characteristics in each treatment condition can be found in Appendix B and C.

Table 2

This table presents a summary of participants in each treatment in complete and partial responses.

4. Hypotheses

The main interest of this study lies in how question ordering by difficulty impacts tests scores. Towle and Merril (1975), Balch (1989) and Yechiam et al. (2004) show that starting with easier tasks improve test performance. Therefore, it is expected that those participants that start with the easier questions and progress to more difficult ones will have a higher score. Starting with an easy task may enhance confidence in subjects and prevent them from getting demotivated (Dominguez-Martinez, 2009), which will result in better overall performance. Based on the former, the first hypothesis is:

H1. In the treatment E-D, final test scores will be higher relative to those in the D-E treatment.

In order to evaluate self-perceptions of ability, two measures are used. First, participants are asked to rate themselves by answering the question: “how good are you doing in this test?” The other measure for perceived ability are performance expectancy

Complete responses Feedback

No feedback Total E-D 24 33 57 D-E 16 19 35 Total 40 52 92

Partial responses Feedback

No

feedback Total

E-D 4 2 6

D-E 3 5 8

(18)

18

levels, for which subjects are asked to state the score they think they have obtained in the test. Weinstein and Roedinger (2012) show that participants starting with easy questions consistently rate their performance throughout the test as higher than those beginning with harder problems. The authors also show that subjects in the D-E condition found the test more difficult, whereas those in the E-D condition said it was easy. Based on this findings, it is expected that:

H2. The percentage of participants with high self-perceptions of ability will be higher than in the E-D treatment compared to the percentage of participants with high self-perceptions in the D-E treatment at both evaluation moments.

H3. The percentage of participants perceiving the overall test as difficult will be higher the D-E treatment, compared to the percentage of participants in the E-D condition at both evaluation moments.

The next hypothesis relates to the levels of enjoyment and motivation. Weinstein and Roedinger (2012) present a result related to enjoyment ratings. In their paper, those in the E-D condition enjoyed the test less than participants in the D-E condition. With respect to motivation, Li et al. (2007) show that as a task gets easier, intrinsic value increases. Therefore, it is expected that:

H4 (a). In the D-E treatment, levels of enjoyment and motivation will increase throughout the test. (b) In the E-D treatment, levels of enjoyment and motivation will decrease as the test progresses.

The fifth hypothesis refers to the level of conformity in final scores. Li et al. (2007) show that when the task seems easy, the individual perceives his performance as higher. Phillips (1984) state that children with a low perception of their ability adopt lower standards and hold lower expectations for success than children with a more positive view of their ability. Therefore, it is expected that:

H5. (a) In the D-E treatment, participants will be satisfied with a lower number of correct answers compared to those in the E-D treatment both evaluation moments.

Moreover, expectations are that: when given the opportunity to do so, the difficult

task will be chosen more frequently by those in the E-D condition and the easy task will be chosen more frequently in the D-E treatment (H6). If indeed those in the E-D

have a higher self-perceptions of their abilities (Weinstein and Roedinger, 2012), they are expected to choose task with higher difficulty. In contrast, those that have a low perception of their abilities will prefer to choose the easy task. This behavior is related to Ehrlinger and Dunning (2003). In the article, the authors explain that women avoid

(19)

19

careers in science because they underestimate their scientific reasoning ability. This means that misjudging one´s abilities makes individuals choose differently.

The last hypothesis of this paper are related to the impact of feedback on performance. Kuhnen and Tymula (2012) found that the effect of feedback depends on the self-esteem considerations the individual has. Their results show that when the feedback received is better than expected, feedback has a positive effect on performance. In contrast, when subjects performed worse than expected feedback has a negative effect on performance. Subjects on the D-E condition are expected to perceive the test as more difficult (see H3), meaning that they will underestimate their performance (Kruger, 1999). In the same way, subjects in the E-D condition will see the test as easier and tend to overestimate their performance. Hence, it is expected that:

H7 (a). Participants under the D-E condition will have a higher performance if feedback was shown than if they were in the no-feedback condition. (b) Participants under the E-D condition will have a lower performance if feedback was shown than if they were in the no-feedback condition.

5. Results

In this section the empirical results of the experiment are presented. First, summary statistics with means and standard deviations of performance and evaluation of performance variables are presented in Table 3. Evaluation of performance variables are measured at two moments: after the first three tasks (initial values) and after the sixth task (overall values). At the first judgement measurement participants are asked about their progress so far. At the second measurement point, participants are asked about their perception of their overall performance. Performance and expected score are measured in points, equivalent to the number of correct answers obtained by the participants. Conformity level refers to the number of correct answers with which the participants will be satisfied at the end of the test. Motivation, enjoyment, perception of difficulty and ability are measured in a Likert scale ranging from 1 to 5.

Intermediate performance refers to the intermediate score participants had in all treatments after completing the sixth task, irrespective of their feedback condition. Due to an error made in the Qualtrics platform during the experimental design, intermediate scores were shown to the participants during the experiment but were not saved in the downloadable dataset. To obtain intermediate scores until task 6, the idea is to calculate

(20)

20

the performance of each participant in the seventh task and subtract it from the final score. Data on the final performance of each individual is available. However, performance in the choice task has to be taken separately from each response collected. To minimize manual calculations, a simple MATLAB code was implemented. Take into account that scores for this task are dependent on the choice the participant made: A or B. In Task A (difficult), correct answers are rewarded with one point each, while in Task B (easy) correct answers are worth 0.5 points. The details of the process are described in Appendix E of this paper.

To examine the relationship between the evaluation of performance measures, expected performance and scores Spearman non-parametric correlations were used. The correlation coefficients for these variables are presented in Table 4. The results show that scores are positively correlated with initial and overall measures of perceptions of ability and conformity levels at significance level of 5%. This means that the better the individual thinks he is performing, the higher his final score. Also, standards for the number of questions correct are positively correlated with performance. In contrast, initial and overall perceptions of difficulty have a negative relationship with performance.

Table 3

Means and standard deviations for performance and evaluation of performance measure at both points of evaluation

Variable Mean

Standard

deviation Min Max

Initial perception of ability 2,49 0,98 1 5

Initial perception of difficulty 2,90 1,14 1 5

Initial enjoyment 3,22 1,16 1 5

Initial motivation 3,53 1,03 1 5

Initial expected performance 13,54 7,60 0 30

Initial conformity level 46,33 17,50 0 70

Overall perception of ability 2,10 1,09 1 5

Overall perception of difficulty 3,33 1,27 1 5

Overall enjoyment 2,98 1,19 1 5

Overall motivation 3,07 1,27 1 5

Overall expected performance 22,93 11,01 1 50

Overall Conformity level 38,12 15,57 0 70

Final performance 26,69 8,57 4 42,5

(21)

21 T able 4 Sp earm an ´s ran k corre la ti on co ef fi ci en ts for s cor es an d evalu at ion o f p er for mance me asur es N ot e: st at ist ic al si g ni fi canc e a t the 10% and 5 % l ev el a re de not ed by * and ** r esp ec ti v el y .

(22)

22

This finding shows that the more difficult seemed to participants, the lower his score. There were no significant correlations between expected scores and any of the variables of interest. Levels of conformity were positively correlated with overall enjoyment and perceptions of ability at 10% and 5% significance levels respectively. However, global perceptions of task difficulty show as negatively related to levels of conformity. This means that participants were satisfied with a lower number of questions correct if the test was perceived as difficult. Motivation and enjoyment were positively related with correlation coefficient of 0.5836. Both of these variables were negatively influenced by perception of difficulty. Lastly, perception of ability was negatively associated to perception of difficulty.

Two non-parametric tests are used to examine the validity of the hypotheses presented before. The Wilcoxon rank-sum test or Mann-Whitney test is used as an alternative to the test when the distribution of the population is assumed to be non-normal. The Pearson chi-squared test of association is employed to evaluate correlations between categorical variables. In the remaining of this section the results of these tests will be discussed in detail.

5.3. Question order and performance

Performance was measured in total points, equivalent to the number of correct answers obtained by the participant. Table 5 shows average final scores and intermediate scores in each treatment. The third column shows p-values for comparison between question order and feedback conditions as a result of Wilcoxon rank-sum tests. The last two rows show bundled p-values comparing question order and feedback treatment. For the no-feedback condition, p-values of 0.9621 and 0.9243 were obtained for final and intermediate performance respectively. This means that there is no significant difference between these variables under the no feedback condition for both E-D and D-E conditions. In a similar way, final and intermediate scores do not differ significantly for participants who received information about their mid-test performance. These results lead to a rejection of the first hypothesis of this paper: question order arrangement did not influence performance.

Furthermore, Table 5 contains information that enables to test hypothesis 7. It was expected that participants in the D-E treatment would have a higher score if feedback was shown than if they were in the no-feedback condition (7a). Final performance for D-E was on average 25.86 and 27.81 in no-feedback and feedback condition respectively. The

(23)

23

results of a Wilcoxon rank-sum test give a p-value of 0.6667. Part b of hypothesis 7 stated that participants under the E-D condition would have a lower score if feedback was shown than if they were in the no feedback condition. Final performance for E-D was on average 26.35 and 26.86 in no-feedback and feedback condition respectively. The p-value obtained from comparing these two average scores was 0.9099. This translates into a rejection of the alternative hypothesis, meaning that in both cases there are no significant difference in performance.

Table 5

Final performance and intermediate performance for question order and feedback conditions.

5.4. Perceptions of ability

There were two ways of evaluating the perceptions the participant about his performance in the test: perceptions of ability and expected performance. Table 6 shows percentage of outcomes for perceptions of ability at both evaluation moments. Initial values are taken from the first measurement point after task 3 and overall values are global judgements of the test after task 6. It was expected that a higher percentage of participants with higher self-perceptions of ability would be in the E-D treatment at both evaluation moments. Initially, 18%+2%=20% of participants in the E-D treatment thought they were doing good or excellent while in the D-E condition only 3%+0% thought their performance was in the same categories. This is expected and also follows logically since the first three questions of the test are either easy or difficult. However, overall judgments of the test show that 11%+14%=25% of the participants rated their performance as bad or weak in the D-E treatment, compared to a 54%+36=90% in the E-D treatment. In the D-E

E-D D-E p-values

No feedback Final performance 26,86 25,86 0,9621 Intermediate performance 23,27 22,63 0,9243 Feedback Final performance 26,35 27,81 0,7613 Intermediate performance 22,89 24,46 0,6286

p-values Final performance

Intermediate

performance

No feedback vs Feedback 0,7516 0,8841

(24)

24

treatment, 34%+3%=37% of the participants thought their overall performance in the tasks was either good or excellent while in the E-D condition none of the participants rated themselves in these categories. In the last rows of the table p-values comparing initial and overall ratings in the E-D and D-E treatments are shown. The results for the Wilcoxon rank-sum test give a p-value of 0.000 and 0.005 respectively. This means that there is a significant difference in overall and initial perceptions of ability depending on the question order arrangement participants belonged to. Although in the E-D treatment participants thought their performance was better at the first evaluation moment, those in the E-D treatment judge the overall level of the test as more difficult.

Table 6

Percentage of outcomes and p-values for initial and overall perceptions of ability in question order treatments.

Initial D-E Initial E-D Overall D-E Overall E-D

Bad 46 4 11 54

Weak 20 30 14 37

Neither good nor bad 31 47 37 7

Good 3 18 34 0

Excellent 0 2 3 0

p-values Initial Overall

E-D vs D-E 0.000 0.005

The other measure for perception of ability was expected performance. Table 7 initial and overall expected performance for participants in D-E and E-D treatments. After the first evaluation moment, the average expected performance under E-D was 16.89 and 8.20 under D-E. This significant difference corresponds to a p-value of 0.000. However, overall test judgements show that participants expected an average of 21.18 and 24.01 points in the D-E and the E-D condition respectively. The results of computing the Wilcoxon rank-sum test show no significance difference between the overall score expected by participants in the different treatments (p-value = 0.24).

(25)

25 Table 7

Initial and overall expected performance (in scores) in question order treatments.

In addition to these measures, the estimation of own ability is investigated. Absolute underestimation and overestimation is obtained by comparing expected scores at the second evaluation moment to true intermediate scores after participants completed task 6. If the expectation of participants are above their true performance, they are overestimating themselves. If, in contrast, participants had a true score than exceeded expectations they are underestimating themselves. The number of participants in each question order and feedback treatments are shown in Table 8. A chi-squared test of association is computed to evaluate whether subjects underestimated or overestimated themselves more often in any of the four different treatments. The results give a p-value of 0.582 for the comparison between feedback and no-feedback conditions, and a p-value of 0.559 for differences between E-D and D-E. In both cases, these findings show that overestimation and underestimation does not significantly differ among the treatments. On the other hand, the data shows that those with a higher number of correct answers underestimate themselves more often in both treatments. This corresponds with Kruger and Dunning (1999) saying that those with high ability usually underestimate what they can achieve.

Table 8 Participants suffering from under- or overestimation in each question order and feedback conditions. The table shows the number of participants in all treatments and p-values comparing feedback vs. no feedback and E-D vs. D-E.

E-D D-E Total

No feedback Underestimation 18 11 29 Overestimation 15 8 23 Total 19 22 52 Feedback Underestimation 9 11 20 Overestimation 7 13 20 16 24 40

E-D D-E p-values

Initial expected

performance 16.89 8.20 0.000

Overall expected

(26)

26 p-values

Feedback vs. No feedback 0.582

E-D vs. E-D 0.559

5.3 Perception of task difficulty

Participants were asked about their perception of difficulty of the test. The figure 2 shows initial and overall ratings for this variable measured in percentage of choices in each category in the scale. It was expected that a higher number of participants rated the experimental task as difficult in the D-E treatment compared to the E-D treatment both at the first evaluation moment and when judging the overall difficulty of the test. Initially, 20% + 37% = 57% of participants found the test either moderately difficult or very difficult in the D-E treatment. This is expected and also follows logically since the first three questions of the D-E version of the test are in fact more difficult. However, 20%+6% = 26% of subjects in the D-E treatment judged the overall level of the test as either very difficult or moderately difficult compared to a 37%+30% =67% in the E-D condition. The results of the Wilcoxon rank-sum test give p-values of 0.0008 and 0.000 for initial and overall differences in perceptions of difficulty respectively.

Figure 2

Perceptions of task difficulty

6% 12% 17% 4% 23% 37% 34% 14% 14% 33% 23% 16% 37% 16% 20% 37% 20% 2% 6% 30% I N I T I A L D - E I N I T I A L E - D D - E E - D

INITIAL AND OVERALL

PERCEPTIONS OF DIFFICULTY

Very easy Moderately easy Neither easy nor difficult

(27)

27 5.4 Motivation and enjoyment

The hypotheses for motivation and enjoyments stated an increase or a decrease in the levels of these variables through the test, depending on the question order condition the participants were in. In order to test whether this indeed the case, the difference between overall and initial levels of these variables are computed. In the D-E treatment it is expected that differences are positive, since enjoyment and motivation will increase throughout the test. In the E-D condition, differences are expected to be negative since motivation and enjoyment are expected to decrease. Table 9 shows percentage of outcomes for positive, negative and no differences in each treatment.

Table 9

Percentage of outcomes in positive, negative and no differences for enjoyment and motivation level

Enjoyment Motivation

D-E E-D D-E E-D

Positive differences 37% 9% 23% 9%

No differences 46% 40% 51% 40%

Negative differences 17% 44% 26% 51%

A chi-squared test was computed to test for an association between changes in the level of these variables and treatments D-E and E-D. The results show a p-value of 0.007 and 0.008 for enjoyment and motivation respectively. This means that whether these variables increase or decrease significantly depend on the question order condition. Particularly, enjoyment (motivation) increased for 37% (23%) of the participants in the D-E treatment. For those in the E-D condition, enjoyment (motivation) decreased in 44% (51%) of the cases. Hypothesis 4 is therefore confirmed with the data provided in this sample.

5.5 Levels of conformity

Levels of conformity are measured in total points. Participants are asked with how many questions correct (or points, since each correct answer is one point) they will be satisfied at the end of the test. As the reader can see in Table 10, subjects in the E-D and D-E treatment are satisfied with an average of 51.5 and 41.5 respectively at the first evaluation moment. This significant difference (p-value 0.0007) indicates that those in the E-D treatment had higher standards than those in the D-E treatments. However, overall level of conformity did not differ significantly between treatments. The average levels of conformity are 36 and 38 points for those in the E-D and D-E conditions respectively.

(28)

28

Moreover, in both question order arrangements levels of conformity decreased. This decline is particularly large for those that had the easier questions at the beginning, who started a satisfaction score of 51.5 points but at the end of the test they would have been satisfied with 36 points (difference of 15.5). Participants starting with harder questions lowered their standards as well, but only by 2.5 points.

Table 10

Average initial and overall levels of conformity (measured in points) in each question order condition.

E-D D-E p-values

Initial levels of conformity 51.5 41.5 0.0007 Overall levels of

conformity 36 38 0.1552

5.6 Task choice

The last question of the experimental task offered participants the possibility to choose between a Task A and a Task B, different due to the level of difficulty and the value of the each correct answer. Task A was a difficult task in which each correct is worth one point. Task B is an easy task that rewards each correct answer with 0.5 points. Table 11 shows percentage of outcomes in task choice per question order and feedback conditions in numbers and percentages. A detailed table summarizing task choices can be found in Appendix D.

The easier task was chosen by 74% and 51% of the participants in the E-D treatment. The difficult task was chosen by 26% and 49%. These results show that in the E-D treatment a significant majority of participants preferred the easy task. Although the difficult task was chosen by less participants in both treatments, Task A was chosen more frequently in the D-E than in the E-D condition. The Wilcoxon rank-sum and the chi-squared test give a p-value of 0.03. Moreover, the results for the same statistical tests indicate that task choice does not significantly differ between feedback treatments. This can be observed by looking at a p-value of 0.18 in Table 11.

(29)

29

Table 11 Percentage of outcomes in task choice per question order and feedback conditions.

In addition, a probit regression analysis is performed in order to explain which variables have an influence in participants´ decisions in what respects the task to choose. The dependent variable is Task Choice. The regression equation constructed is the following:

𝑇𝑎𝑠𝑘 𝐶ℎ𝑜𝑖𝑐𝑒𝑖 = 𝛼 + 𝛽1 (𝑄𝑢𝑒𝑠𝑡𝑖𝑜𝑛𝑂𝑟𝑑𝑒𝑟) + 𝛽2 (𝐹𝑒𝑒𝑑𝑏𝑎𝑐𝑘) +

𝛽3 ( 𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒) + 𝛽4 (𝑂𝑣𝑒𝑟𝑎𝑙𝑙𝑀𝑜𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛) + 𝛽5 (𝑂𝑣𝑒𝑟𝑎𝑙𝑙𝐸𝑛𝑗𝑜𝑦𝑚𝑒𝑛𝑡) + 𝛽6 (𝑂𝑣𝑒𝑟𝑎𝑙𝑙𝑃𝑒𝑟𝑐𝑒𝑝𝑡𝑖𝑜𝑛 𝐴𝑏𝑖𝑙𝑖𝑡𝑦) + 𝛽7( 𝑂𝑣𝑒𝑟𝑎𝑙𝑙𝑃𝑒𝑟𝑐𝑒𝑝𝑡𝑖𝑜𝑛𝐷𝑖𝑓𝑓𝑖𝑐𝑢𝑙𝑡𝑦) +

𝛽8( 𝑂𝑣𝑒𝑟𝑎𝑙𝑙𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒) + 𝛽9( 𝑂𝑣𝑒𝑟𝑎𝑙𝑙𝐶𝑜𝑛𝑓𝑜𝑟𝑚𝑖𝑡𝑦𝐿𝑒𝑣𝑒𝑙) + 𝑒𝑖

Table 12 shows regression coefficients obtained, and statistical significance at the 5% level is denoted by two stars. The first column shows the impact of question order arrangements on task choice alone. The second and the third column include feedback and intermediate scores. Column 4 contains regression coefficients for evaluation of performance measurements. Only the overall values of these variables are taken into account for this analysis because it is after the second evaluation moment that participants are given the chance to choose tasks. It is the global levels of these measures that may influence task choice and not the initial judgements. In the first three columns, question order is statistically significant with coefficients 0.597, 0.594 and 0.600 respectively. In the last column, however, E-D and D-E treatment variations do not impact task choice. The remaining coefficients do not show any statistical significance since their corresponding p-values lie above the 5% significance level. However, these insignificant results may be explained by the low size of the data sample used to perform the analysis.

E-D D-E Task A (Difficult) 26 49 Task B (Easy) 74 51 No feedback Feedback Task A (Difficult) 29 42 Task B (Easy) 71 58 p-values Feedback vs. No feedback 0.18 E-D vs. E-D 0.03

(30)

30 Table 12 Determinants of task choice

This table presents the estimates of a probit regression model with multiple regressors. The dependent variable Task Choice. Statistical significance at the 5% level is denoted by **.

5.7 Feedback and evaluation of performance measures

In past sections, differences in feedback conditions have been considered for final, expected and intermediate performance, and task choice. In this section, the differences in feedback condition in evaluation of performance variables are examined. For motivation, enjoyment, perceptions of difficulty, perceptions of ability, expected scores and conformity levels we have made comparison between initial and overall values. However, feedback does not influence initial levels of these variables because it is shown (or not) to participants only after the first three question are completed. Therefore, only overall measurements are taken into account. Averages and p-values for evaluation of performance measures in feedback and no feedback treatments are presented in Table 13. All p-values reported range between 0.43 and 0.79. This means that for none of the variables were there statistically significant differences between feedback conditions.

Dependent variable: Task Choice

1 2 3 4 Constant -0,633 -0,796 -0,505 -0,104 Question order 0,597** 0,594** 0,600** 0,369 Feedback 0,363 0,377 0,457 Intermediate performance -0,013 -0,021 Overall motivation -0,211 Overall enjoyment 0,104

Overall perception of ability 0,084

Overall perception of difficulty -0,115

Overall expected performance 0,001

Overall conformity level 0,009

(31)

31 Table 13

Averages and p-values for evaluation of performance measures in feedback and no feedback treatments

Feedback No feedback p-value

Overall expected performance 23.05 22.71 0.43

Overall conformity level 36.00 39.63 0.79

Overall motivation 3.42 3.03 0.29

Overall enjoyment 3.20 2.80 0.38

Overall perception of difficulty 3.15 3.46 0.67

Overall perception of ability 2.02 2.15 0.73

6. Discussion

The results reported above show that, at least for this sample, arranging questions in ascending or descending difficulty did not influenced actual performance. Although this finding does not match expectations, it is in line with research by Weinstein and Roedinger (2010), Jackson and Greene (2014) and Pettijohn and Sacco (2007). These authors found that performance did not differ significantly as a function of question order. Based on Weinstein and Roedinger (2012) it was expected that participants starting with more difficult questions would found the experimental task as more difficult and would consistently rate their performance less optimistically compared to subjects in the E-D treatment. The authors concluded that their results support an anchoring explanation, in which perceptions of ability and difficulty of the test as a whole are heavily influenced by the impressions formed at the beginning of testing situation. If initial perceptions are that of an easy test, expected performance (perceived difficulty) will be higher (lower) than if the initial impressions are that of a difficult test. This explanation suggests that participants fail to adjust their evaluations of performance as the difficulty of questions changes across the test.

In this paper, participants in the D-E treatment perceived their performance as worse than those in the E-D at the first moment of evaluation. However, the results of the present study show a reversed effect for global judgements. Participants´ overall impressions of performance were higher when the test began with harder items and the test was perceived as easier in that same question order arrangement. A possible explanation to these findings relates to the availability and retrievability heuristics introduce by Kahneman and Tversky in 1974. There are situations in which people assess the probability and frequency of an event based on how easy they can be brought to mind.

(32)

32

These behavioral biases cause people to form conclusions using information that is readily available in their memories. In a similar way, when test-takers are asked to evaluate overall performance they are using their impressions of performance in the most recent questions in the test. This means that global judgements are heavily influenced by the easy or difficult items at the end of the test. Therefore, participants in the D-E condition give more optimistic performance evaluations not because the tests begins with harder questions, but because that version of the test ends with easy questions. This retrospective memory bias also influences perceptions of difficulty: participants that did the easier tasks as last had the impression that that test was easier.

The results of this study show that motivation and enjoyment levels increase as tasks get easier, which is in line with Weinstein and Roedinger (2012) and Li et al. (2007). A novel aspect of the present experiment is the influence of question order on future choices. Because participants in the D-E treatments were expected to have low impressions of their performance, the easy task was expected to be chosen more frequently by participants starting with more difficult questions. However, the easy task was chosen more frequently by test-takers that had easy items at the beginning. This result is in line with the retrospective memory bias introduced above: participants make their decisions based on their more recent impressions of performance. If the subject got the easier questions at the end, he was more optimistic about his performance at this point and therefore chose for a task with a higher level of difficulty.

When participants started with easier items they had higher standards for final scores than those having harder items at first, but only after the first evaluation moment. After completing all questions, conformity levels did not statistically differ between order questions. Interestingly, there were no signs of an overestimation or underestimation bias in any of the question order or feedback treatments. This means that participants were able to adjust expectations and satisfaction levels when giving their global judgements of the test. In contrast to what was hypothesized, providing feedback on performance did not differ significantly in the question order conditions and did not influence any judgements of performance measures. Individual decisions did not depend on the information given to participants about their performance. The reason for this result may be that test-takers did not make use of their intermediate scores to adjust their levels of motivation or enjoyment, their expected scores nor their conformity levels. Moreover, it can be that an experimental setting did not stimulate participants enough to use the feedback provided in their benefit.

Referenties

GERELATEERDE DOCUMENTEN

Results of experiment 1 show evaluators are influenced by the order in which performance measures are presented, more specifically, a primacy effect exist when

I have dealt with a number of performance measures in this paper, including the amount of incorrect invoices (quality), the time between customer order receipt and the release of

In these terms, the focus on method in philosophical writing and the silence there about its literary character are symptomatic not of the irrelevance of stylistic issues, but of

( 2011 ) considers n-dimensional ones, which can be degenerate... degenerate models are not a priori freed from any constraint, and this is the point on which we would like to

Als dientengevolge voor de verdachte de gelegenheid heeft ontbroken het slachtoffer te (doen) ondervragen, staat artikel 6 EVRM er niet zonder meer aan in de weg dat de door

Examining CJEU case law on the restrictive grounds of public policy and public security in the context of Directive 2004/38/EC, we observed that the discretion of

The test that Moore proposed to determine whether an attempt at defining ‘good’ is correct and not an attribution in disguise is the so-called “Open Question Argument.” The

Maintenance that requires a high complex combination of knowledge, resources and infrastructure, by which the system is extorted for a