3 What Affects a Task Dif ﬁculty - University of Groningen A new informatics curriculum for sec

A difficulty of a task is affected by a number of factors that can be divided into the so called general and subject specific factors. General (formal) factors are such that can be come across in knowledge tests in general and are not specific for a particular disci-pline. These are for example:

– length of the text

– demands on reading comprehension (how attentively the text must be read) – task formulation (e.g. is the posed question negative?)

– formulation of answer (e.g. presence of distractors or confusing answers, tricks) – use of an explanatory picture

– use of an illustrative example

Subject specific factors are tied to informatics and are specific for an informatics contest. A task difficulty can then be affected e.g. by:

– area (e.g. whether optimization tasks are more difﬁcult than algorithm tasks) – way of solution (competences needed for the solution of the task)

– situation (to what extent the situation is close to the contestant’s life experience) – presence of formal description (code, programme, formula)

– expertness of task assignment (to what extent it uses technical terminology whose insufﬁcient knowledge may result in failing to understand the task itself)

General factors of task difﬁculty in Bebras contest are discussed by Gujberová [10]

who draws attention to cognitive aspects of tasks. In her study she has proven an impact of formal aspects on overall task difﬁculty. In this respect Pohl provides rec-ommendations on a task formal aspects resulting in higher quality of the task, e.g. short sentences, a one-to-one relationship between words and objects, appropriate analogies [3]. Some factors have different consequences in works of different authors. E.g.

Pozdnyakov states that“among numerous ways of assessment of difﬁculty of a text, the most straightforward is the length of the statement” [9, p. 34]. In contrast results of Gujberová’s research suggest no correlation between the length of the assignment and task difﬁculty [10].

Some subject specific factors were studied by Křížová [11], namely the area, presence of formal description and way of solution (needed competences). These comparisons have only shown a significant correlation between presence of formal description and task difficulty.

92 J. Vaníček

4 Method

In the first stage of our research we focused on defining the index of a test task difficulty. At the second stage we were looking for factors that (in our opinion and in literature) have impact on a task difficulty and at the same time can be identified in assignments reliably.

The basic set of analyzed tasks was the set of all tasks from Czech national rounds of Bebras contests from 2012–2015, which is 283 contest tasks in 5 age categories for pupils from 9 to 19. The smallest number of contestants solving one task was 2492, the average number of solvers per one task was 7198.

The reason why we excluded tasks from foreign contests was linguistic– it is very difﬁcult to analyze tasks posed in foreign languages (e.g. how to assess demands on reading comprehension from the contestant’s point of view). Another reason is that different countries organize the contest differently and the systems do not provide comparable data for stating which tasks were solved successfully and which quanti-tative factors might have had impact on success rates (e.g. data about time needed for solving the task) (Fig.1).

4.1 Selection of Indicators for Stating Difﬁculty Index

We determinedﬁve indicators that describe and allow prediction of difﬁculty of Beaver of Informatics contest tasks. These indicators were compared. We explored to what extent they can be regarded as relevant. These indicators are:

– Contestants’ success rate – what percentage of contestants gave a correct answer (in case of multiple choice task chose the correct alternative).

– No answer – the proportion of contestants who skipped this task, chose not to answer it.

– Authors’ opinion – how the authors of the tasks or test (i.e. pedagogical experts) define the task difficulty (the tasks are classified as easy, middle and hard; a test in each category includes the same number of tasks of each difficulty).

– Solving time – how long it took the contestants to answer the task.

– Contestants’ opinions – how many respondents (contestants) marked the particular task as the most difﬁcult in a questionnaire ﬁlled in immediately after the contest.

Each of these indicators has its limits. Sometimes a different interpretation than interpretation pointing at a task difﬁculty may be possible. Thus the data these indi-cators provide cannot always be perceived as absolutely reliable.

Fig. 1. Relationship between indicators and factors of task difﬁculty

What Makes Situational Informatics Tasks Difﬁcult? 93

– Contestants’ success rate might be affected by cheating, by helping each other out and whispering answers. Analysis of cheating in Czech contest shows that this objection is far from purely hypothetical [12].

– No answer may also be caused by a situation when the contestants’ do not have enough time to complete the test and thus do not get to some of the tasks.

– In case of tasks adopted from other countries, opinions of authors’ on task difﬁculty may be affected by different distribution of contestants to age categories.

– Solving time is given as the difference between the time of subjecting the answer to the previous and the currently solved task. However, this may not be the time in which the contestant was really solving the given task.

– The follow-up questionnaire is voluntary, which means data from all contestants are not available.

4.2 Comparison of Indicators to State a Task Difﬁculty

The above listed indicators are complementary to each other and if they are combined they can determine a task difficulty more precisely and more complexly. What we tried to do was to take into account all these indicators to define the coefficient of mean difficulty. We tried to find out how individual indicators differ from each other, from the mean, what their variance is and which of the indicators best corresponds to the others.

The sources of data for this part were reduced to the results of the 2012 and 2013 national rounds of the Beaver of Informatics in upper secondary school categories.

These provided the data for thefirst for above listed indicators. The data came from 18653 evaluated contestants in 60 tasks. The reason for this reduction of the set of data was that contestants’ comments on the task difficulty given after the contest were available only for this set of tasks. These comments were from 1414 respondents of the questionnaire in which the selected the most difficult test task in the category.

For all indicators, the order of tasks for a given age category was stated as follows:

– Success rate: correct answer ratio.

– No answer: the most difﬁcult task is skipped by the greatest number of contestants.

– Authors’ opinion: the order of tasks of a given difﬁculty (easy, middle, hard).

– Solving time: the most difﬁcult problem took most time to solve.

– Contestants’ opinion: the task was selected as most difﬁcult by contestants.

To be able to assess which of these indicators has most impact on a task difﬁculty, we studied the distance of individual indicators from the mean value of order, to what extent, if compared in pairs, the indicators provide similar results, and in how many tasks the particular indicators differ extremely from other indicators.

4.3 Search for Factors with Impact on a Task Difﬁculty

Based on an analysis of task text assignments we were looking for those properties of a task that are easy to detect in tasks across age categories and that are present in a signiﬁcant number of tasks. For this reason some factors that looked very promising at 94 J. Vaníček

ﬁrst had to be dropped. The greatest risk especially in case of qualitative factors is the subjectivity in deciding whether a factor is present in a task or not. Full set of tasks from 2012–2015 was used for this part of research.

In the end we defined a total of 20 possible factors that were grouped according to how they can affect higher/lower difficulty of a task type: topic, way of answering, type of interactivity, other general factors and subject specific factors.

Difficulty Given by the Task Topic. The thematic area or topic of a contest task is given according to Dagienė and Futschek [13]. This is used by International Bebras Committee for categorization of proposed tasks in national contests: algorithmization, understanding information and its representation, understanding structures and problem solving. We wanted tofind out whether any of these 4 categories could be factors of difficulty, i.e. whether the topic itself can determine task difficulty.

Difﬁculty Given by the Way of Answering. The contest test allows three types of answers:

– Multiple-choice – selection of one out of four possible answers. It limits the choice of answers, it involves distractors – trick choices and allows the solving strategy

“going through all choices and comparing them”.

– Textbox – entering text into a textbox (most often using a number or a word from the assignment). This form allows more variety than multiple-choice but it is sen-sitive to syntactical errors.

– Interactive – most often by manipulating objects on desktop, by moving them, putting them in a different order, changing their appearance by clicking. Some of the tasks were programmed as games or allowed keyboard input.

Type of Interactivity. Since we anticipated that this research will verify popularity of interactive assignments and as there are several variants of interactive solutions, interactive tasks were divided into subcategories and different factors were linked to these categories:

– Drag and drop – moving objects on desktop e.g. to the right order, making pairs – Click – tagging objects by clicking (appearance of objects changes)

– Text – controlling interactivity by writing a text, e.g. by writing a programme code – Game – more complex control described by rules, often resembles control of a game

(puzzle, maze).

Other General Factors. General factors are those that can be come across in test tasks regardless of the discipline they come from. These are:

– length of the text – number of signs in the assignment

– demands on reading comprehension – how difficult it is to read the text and how much attention is needed (long sentences, repetition of similar words, precisely described situations). This difficulty may not necessarily be related to scientific demands of the text.

– illustrative picture – a picture that illustrates the situation, explains a concept What Makes Situational Informatics Tasks Difﬁcult? 95

– example – a concrete example that illustrates the rules described in the assignment and which allows to check understanding of the assignment

– negative question – question that is formulated as e.g. “Who is not?” instead of

“Who is?”. Children may miss the negation in the question and answer it as a positive one

Subject Speciﬁc Factors. In informatics these are:

– Technical terminology – terms and concepts from informatics and computer sci-ence. If technical terminology is used, contestants’ success will be affected by their expert knowledge which depends on the curriculum implemented at their school.

– Formal description – e.g. use of code, formula, excerpts from programmes, chains of seemingly disconnected signs, abridged description etc. Tasks that include a graph, table, diagram do not fall in this category, unless some code is included as well.

– Graphical structures – there is a graph, diagram, map or scheme in the task assignment. Contestants must be able to read information in graphical structures, grasp them,ﬁll in data into them or to construct them from the given data.

– Optimization – optimization tasks can be perceived as a stand-alone thematic cat-egory, optimization does not appear in test task topics according to [13]. That is why we include it here. In some cases tasks looking for maximum or minimum are included in this category.

Hypotheses on impact of factors on a task difﬁculty were formulated for these general and subject speciﬁc factors. The null hypotheses were formulations of the type

“The difference in the mean value of difficulty of tasks with the given factor and without this factor is zero”. By comparing it to the task difficulty index we assessed whether the given factor affects success rate. As the hypothesis of equal variances in most of the studied factors was disproved, Welch’s statistics that takes into account unequal variances was used. We considered impact of a factor to be proved if the null hypothesis was disproved on 95 % level of significance. If a test proved a statistical significant difference in comparison of mean values in both sets of tasks, it could be derived whether the given factor makes the task simpler or more difficult.

The only factor that was assessed differently was the length of the text. Here the parameter of stating the length was the numerical value of the number of signs in the task assignment. This allowed linear regression.

In document University of Groningen A new informatics curriculum for secondary education in The Netherlands Barendsen, Erik; Grgurina, Natasa; Tolboom, Jos (pagina 105-109)