From product- to process oriented psychometrics : slow/fast response classes and class specific item parameters as indicators of response process

(1)

From Product- to Process Oriented Psychometrics

:

Slow/Fast Response Classes and Class Specific Item Parameters as Indicators of Response Process.

Zenab Tamimy 10324364

Supervised by Maria Bolsinova

June 2016

ABSTRACT: With the emergence of computerized testing, response times have become more available and thus more used in modelling responses. Focusing on response times and not only on response accuracy has some great benefits. For instance, response times might be indicators of underlying response processes. In this thesis a brief overview of existing models including response time is provided. The viability of one of the most recent models, the Response Mixture Model with Markov Property is tested in a simulation study. This model is a conjugant model in which response accuracy is modelled class specifically. The model has proven to perform well in classifying responses to certain classes, through various designs. Response modelling this way, has an important advantage. Namely, having differing accuracy models for the different processes may be a solution for the possible difference in the latent variable underlying a response.

(2)

Introduction

Historically seen, researchers and practitioners in educational measurement have had the tendency to emphasize on the product of cognitive functioning instead of emphasizing on the process of cognitive functioning (Levine, Preddy & Thorndike, 1987). This often resulted in a great interest and usage of the accuracy of a response, due to the idea that accuracy is directly related to an underlying latent ability; where high accuracy suggests a great ability (Hofman, Brinkman, Van Der Maas & Maris, 2016). Therefore, accuracy was seen as the most informative product (Levine, Preddy & Thorndike, 1987). Meanwhile, speed and the related response time1_{were not focused on. The importance of speed in educational}

measurement was, however, never neglected. For example, Thorndike, Bregman, Cobb and Woodyard (1926) refer to speed and response time in their third theorem on the measurement of intelligence. The theorem reads as follows: “Other things being equal, if intellect A. can do at each level the same number of tasks as intellect B, but in a less time, intellect A is

better.” The theorem suggests that not only the accuracy of a response is informative in distinguishing between higher and lower competence, speed is an important factor as well. Namely, speed can give an extra dimension to the assessment of a person’s competence regarding specific tasks (e.g. intelligence test or educational tasks). Yet, more specifically, what is argued in this paper is the fact that speed and the related response time may serve as indicators of the process underlying a response.

The emphasis on the product of cognitive functioning has led to significant progress in educational measurement regarding response accuracy. There has been a clear shift in the usage of measurement models which relate to accuracy (Fan, 1998). Equation 1 gives a

1_{Note that speed and response time do not refer to the same attribute. Speed is a latent variable where} response time is the manifest variable related to it. The larger the response time, the lower the speed.

(3)

representation of the 2 parameter logistic function of the IRT model which is an extension of the 1PL Rasch model (Rasch, 1960; De Ayala, 2013).

𝑃(𝑋_𝑖 = 1| 𝜃) = 𝑒𝛼𝑖𝜃−𝛽𝑖

1+𝑒𝛼𝑖𝜃−𝛽𝑖 (1)

In this model the probability of an accurate response is related to item and person

characteristics. The person characteristic is referred to as ability (θ) and is a between person parameter. The item characteristics are: 1) Difficulty (β); this parameter refers to logit of the probability of correct response for a person with zero ability. Mathematically seen β is the center of logistic curve. 2) Discrimination (α); this parameter refers to the degree in which an item discriminates between persons. Mathematically seen α is the maximum slope of the logistic curve. (De Ayala, 2013; Rasch, 1960; De Mars, 2010)

Meanwhile, the interest in response time and speed has had a relatively slow start. Although there have been models that combine response time and response accuracy for a few decades (Roskam, 1987; Verhelst, Verstralen & Jansen, 1997), the models weren’t broadly used. The usage of response times in research increased with the emergence of computerized testing. Response times were now registered easily and were widely available (Molenaar, 2015).

Nonetheless, the first line of conducted research concerning response time and speed was previous to computerized testing. As speed and response times were never neglected in their importance, but were difficult to obtain, they were first approached conceptually. The first line of research on the subject was mainly on the speed-accuracy tradeoff. This is a phenomena that was first named in 1954 (Fitts, 1954). According to Fitts, the speed-accuracy tradeoff is the phenomena underlying the precision of a response. In principle, the more accurate a response is, the more time responding will take. While, the less time a person takes to respond, the less likely it is for a response to be correct. And so, Fitts argues, there is a

(4)

negative correlation between speed and accuracy; the speed-accuracy tradeoff. The

occurrence of this tradeoff would indicate that tests aiming on measuring ability are not only taking the ability of a person in to account, but merely the position of a person on the speed-accuracy trade off. And thus, the speed-speed-accuracy tradeoff suggests that speed-accuracy is not only influenced by the underlying ability but also by the amount of time a person chooses to spend on an item or task. With the acceptance of this idea, the interest in speed and response times grew. It was only a matter of availability of the latter until the first response time models were constructed.

The first models including response time and speed were motivated on the idea of the speed-accuracy tradeoff (Roskam, 1987; Verhelst, Verstralen & Jansen, 1997). These models directly embedded the characteristics of the speed-accuracy tradeoff, i.e. the less time a respondent spent on an item, the less likely the estimated ability of this person were to be high. Van der Linden (2007; 2009) however, argued that the speed-accuracy trade of does not have to be included in modelling speed and response times, because the tradeoff only exists within persons. When respondents take a test, they choose a level of speed on which they operate, so argued Van der Linden. If a test has a fixed set of items and is fixed on a person, the only thing that counts is the position on the speed-accuracy tradeoff a person has chosen. Therefor a new way of modelling is presented by Van der Linden; the hierarchical model (Van der Linden, 2007; 2009). This is a multilevel model where a multivariate normal

distribution of speed and ability is assumed at the second level. Speed and ability are referred to as the population parameters. At the same level, one can find the item parameter which are also from a joint distribution. At the first level of the model, the speed population parameter and the item parameters, time intensity and discriminating power, are underlying the response time. The population parameter ability and the item parameters as known from IRT-models, are assumed to underlie the response accuracy. Further research have led to simplified version

(5)

of this model, where only a joint distribution of the population parameters are assumed (Molenaar, Oberski, Vermunt & De Boeck, 2012; Hofman, Brinkman, Van Der Maas & Maris, 2016)

The common ground on which (simplified) hierarchical models operate, is the fact that they all model between subject differences. Thus, the latent variables are assumed to be static within subjects. From a statistical perspective this is not problematic, since the assumption can be accommodated for by modelling the conditional independence between response accuracy and response time (Fernando & Lorenzo-Seva, 2007; Molenaar, Tuerlinckx & Van Der Maas, 2015; Ranger, 2013). Recently Goldhammer (2015) proposed an approach to correct for dynamics in the speededness of testtakers. Goldhammer suggested an experimental approach to tests. This included fixing the time to respond for every respondent, so that the speed latent variable is fixed within persons. Although this may work under restricted situations (Bolsinova & Tijmstra, 2015), the implementation of response times in measurement models without assuming a fixed speededness within persons, has some interesting advantages (Molenaar, 2015).

First, tracing within person fluctuations in speededness can give information on dynamical test behavior. Dynamical test behavior can occur in various situations. For example, when there is a learning factor that influences the responses (e.g. when a test has similar items, it is likely that a respondent first answers the items incorrect but when the respondent proceeds, learning occurs and the items are answered correctly). Another possibility that would lead to dynamic test behavior is when a respondent is aware of answering an item incorrectly and therefore slows down on the following items (i.e.

compromises on speed, thus accuracy will increase), this phenomena is known as post-error-slowing. When conditional dependence between response time and response accuracy is found, this can be an indicator of dynamic test behavior.

(6)

Secondly, having information on the within person fluctuations in speededness can be used to assign different scores to different speed-accuracy compromises (Molenaar, Bolsinoca, Rosza & De Boeck, in press). For instance, one might classify a correct and fast response as better than a correct and slow response. Assumingly, response times can be seen as additional information about the person’s competence to answer a specific item.

Thirdly, difference in speededness and thus in response times within persons can be seen as indicators of underlying processes of responding (Molenaar, 2015). When answering an item, a respondent passes through a certain cognitive process. The response time can be seen as the time from the beginning of that process up until the actual responding. Differences in response time can indicate differences in response strategy or process. For example, as illustrated by Van der Maas and Jansen (2003), children differ qualitatively in performance on the balance scale task. The differences are noticeable when taking response time in to

account. For some items, subjects who answered that item correctly did this fast, while others who answered the items correctly did this more slowly. These differences may imply different solutions strategies. Theoretical considerations of the specific items led Van der Maas en Jansen (2003) to believe that this was the case. It was plausible to assume two different solution strategies, where one would be faster than the other.

This might also be the case in certain mathematical items. Mainly items that are not easily memorized or calculated, for example see Table 1. One can think of various strategies for solving these type of items. For instance, there might be a fast strategy that’s merely dependent on information retrieval from long-term memory, as seen in the first column in Table 1. A slow response can imply a slow strategy where multiple objects should be retrieved from the long term memory which then has to be formed in to a final response by computation in the working memory. The first strategy is dependent of less cognitive aspects and thus the process underlying the response might be quicker than in the second strategy.

(7)

Table 1. Strategies for solving mathematical item.

Math. Item Fast Strategy Slow Strategy

34 * 19 = 646 34 * 20 = 680 680- 34 = 646 30 * 10 = 300 30 * 9 = 300- 30 = 270 4 * 10 = 40 4 * 9 = 36 300 + 270+40+36 = 646

If the difference in execution time between the fast and slow strategy is large, it is likely that the item parameters related to response accuracy might also differ. For instance, one could argue that certain cognitive processes are easier than others, simply because it’s less likely to make mistakes when following this particular process. For example, response

processes that rely fully on information retrieval (fast) might be easier than slow processes because in the slow process the respondent makes less use of the working memory. The working memory is more sensitive to information loss than the long-term memory and the loss of information might lead to incorrect responses (Cowan, 1999;Partchev & De Boeck, 2012). Therefor it is more likely that slow processing leads to more incorrect answers. Since the difficulty parameter of an item in IRT is dependent on the amount of correct answers, assuming a different difficulty parameter for fast and slow responses is plausible. A different discrimination parameter for different strategies is also plausible. For instance, because of difference in number of classifications per item or difference in variance of response accuracy.

Separating fast responses from the slow ones can be done in various ways. First as illustrated in the IRTree model (Partchev & De Boeck, 2012) response times can be classified as fast and slow by a median item and median person split. This means that response times that are smaller than the cutoff value, the median, are classified as fast responses, while response times that are larger than the cutoff value are classified as slow. Another way of

(8)

differentiating between fast and slow can be done by looking in to the residual response time of a response. For instance, extreme response times can suggest different strategy use while patterns in residual response times may suggest dynamical test behavior (Molenaar,

Bolsinova, Rosza & De Boeck, 2016). Finally response times can be separated into fast and slow by using both information from the response accuracies and the response times

(Molenaar, Oberski, Vermunt & De Boeck , 2016). Namely, it is likely that the classification of a response is dependent on the class of the previous response. For this reason Molenaar, Oberski, Vermunt and De Boeck (2016) specified a mixture model with a Markov structure to classify slow and fast responses. This model suggests, in contrast to the previous mentioned models, a differing distribution of response times in the different classes. In this thesis the viability of the Response Mixture Model (RMM) will be tested.

The outline of this thesis will be as followed. First the three models previously mentioned briefly will be discussed. Secondly a simulation study will be done to test the viability of the Response Mixture Model for various designs (i.e. detect the power of

classifying responses). Therefor data will be simulated under the model. The parameters will be estimated using a Markov Chain Monte Carlo sampling algorithm (Casella & George, 1992). The results of the test of power and evaluation of the parameter estimation will be expanded and discussed. Finally the conceptual consequences of this thesis will be discussed in the discussion section.

Models

Typically measurement models where response time and speed are included, can come in three types (Hofman, Brinkman, Maris & Van Der Maas, 2015). First, there are models in which response time and accuracy are conditionally independent. The previously mentioned hierarchical model (Van Der Linden, 2007;2009) suggests this conditional independence. Secondly, there are models in which response time and accuracy are not considered

(9)

independent. Maris and Van Der Maas (2015) suggest a model without this conditional independence. Finally one can find models in which ability is conjugate on speed (Equation 2).

𝑃(𝑋_𝑖 = 1| 𝜃_𝑝, 𝑍_𝑝𝑖 = 𝑧) = 𝑒𝛼𝑧𝑖𝜃𝑝−𝛽𝑧𝑝

1+𝑒𝛼𝑧𝑖𝜃𝑝−𝛽𝑧𝑖 (2)

In these type of models the parameters for ability are dependent on the classification of a response. Therefore, the parameters of the model for response accuracy differ per state. That is, the probability of person p answering item i correct (𝑋_𝑖 = 1) given the ability of person p

(𝜃𝑝) and the classification of the response (𝑍𝑝𝑖 = 𝑧) is defined by the 2PL IRT model with

person and class specific difficulty and discrimination parameters (𝛼_𝑧𝑖, 𝛽_𝑧𝑖).

The IRTree Model (Partchev & De Boeck, 2012)

Partchev and De Boeck (2012) suggested the IRTree model for modelling response times. A representation of the model is found in Figure 1.

Figure 1.

(10)

Going left twice represents fast and correct responses and going right twice represents slow and incorrect responses. The response times are divided into fast/slow classes by a within-person median split or a within-item median split. That is, the response time of a within-person on an item is compared to the median. If it is smaller than the median (person or item) it is classified as fast and when it is larger than the median it is classified as slow. The probability of a correct response is computed as in Equation 7, with class specific item parameters. The model is very useful for identifying differing item parameters per class and showing different processes underlying a response exist. This was shown for Matrix Reasoning and Verbal Analogies tasks by Partchev & De Boeck (2012)

The Residual Response Time Model (Molenaar, Bolsinova, Rosza & De Boeck, 2016)

With the median split way of classifying item response times, it is either assumed that persons use the fast strategy as much as the slow strategy (within person split) or that for each item the fast strategy is used as much as the slow strategy (within item split) (Molenaar, Oberski, Vermunt & De Boeck, 2016). This is not neccesarily the case. Therefore Molenaar, Bolsinova, Rosza & De Boeck (2015) suggested a model whereas the residual response time is used to classify a response to a slow or fast class (RRT model). Response times are assumed to follow a lognormal distribution

𝑇_{𝑝 𝑖}~𝑙𝑜𝑔𝑁 (𝜉_𝑖 − 𝜏_𝑝, 𝜎_𝑖) (3)

That is, the response time of person p on item i (𝑇𝑝𝑖) follows a lognormal distribution with a

mean as the time intensity of item i (𝜉_𝑖) minus the speed of person p (𝜏_𝑝) and a standard

deviation of response time of item i (𝜎𝑖).

Classification in this model is dependent on the residual response time of a person on an item.

(11)

𝜀_𝑝𝑖 = 𝑇_{𝑝 𝑖} − (𝜉_𝑖− 𝜏_𝑝) (4)

The residual response time of person p on item i (𝜀𝑝𝑖) is the time spent on an item (𝑇𝑝𝑖)

minus time intensity of an item (𝜉_𝑖) and the speed parameter of a person (𝜏_𝑝). In other words,

the residual response time can be understood as the difference between the true time spent on an item and the expected time to spend on an item. The model follows a probabilistic way of classifying response, where the probability of a response to be classified as slow is;

𝑃(𝑍 = 1) = 𝑒𝜁0 𝑖+𝜁1 𝑖 ∗𝜀𝑝𝑖

1+ 𝑒𝜁0 𝑖+𝜁1 𝑖 ∗𝜀𝑝𝑖 (5)

That is, the classification is dependent on three parameters. First the residual response time of person p on item i (𝜀_𝑝𝑖), where a large residual response time is likely to suggest a slow

response and a small residual response time is likely to suggest a fast response. The 𝜁₀

parameter can be seen as the difficulty to answer slowly on the particular item, or as the cut-off from which a response time will be classified as slow. The 𝜁₁ parameter can be seen as the strictness in which this cut-off is applied. The higher the 𝜁₁ parameter, the more likely it is that a response is classified according to the 𝜁₀ cutoff. For example, the IRTree model uses a 𝜁₀ parameter that is equal to the median of the item or the person and the 𝜁₁ is infinite.

The response accuracies are modelled as is displayed in Equation 1. The distribution of the latent variables is as follows;

𝜃𝑝, 𝜏𝑝~ 𝑀𝑉𝑁(0, 𝛴𝜃𝜏) (6)

That is, the latent variables speed (θ) and ability (τ) follow a multivariate normal distribution with a mean of 0 and covariance matrix of speed and ability.

(12)

The Mixtures Model With Markov Property (Molenaar, Oberski, Vermunt & De Boeck, 2016)

The previous model suggests one distribution for fast and slow response times. Classification is based on the residual response time which is derived from the difference between the expected and true response time. Implying this means the classification is based on a cutoff within the distribution of response times in which the large residual response times are assumed to be fast and the small or negative residual response times are assumed to be slow. But, when assuming response strategies to be qualitatively different, it might be more plausible to assume a mixture distribution, i.e. two different distributions for slow and fast responses. Molenaar, Obserki, Vermunt and De Boeck (2016) specified a model in which response times follow such a mixture distribution.

𝑇_𝑝𝑖~𝑙𝑜𝑔𝑁 (𝜉_𝑧𝑖− 𝜏_𝑝, 𝜎_𝑖) (7)

The model suggests that response times follow a log normal distribution with a mean as the difference between the time intensity of an item per class (𝜉𝑧𝑖 ) and the speed parameter of a person and a standard deviation of the response time on item i (𝜎𝑖). Since the time intensity

of an item is class specific, this means that fast and slow response times do not follow the same distribution but two different distributions, in contrast to the RRT model .

An extension to the Mixture model can be made by assuming a Markov structure on the class membership of consecutive responses. A Markov structure suggests that a certain value is dependent on the previous value. Theoretically the extension can be supported. For, one can imply that when a respondent first answers an item slow, the respondent isn’t familiar with the type of items and will answer the second item slow as well. Whilst when learning occurs and the respondent answers an item correctly and fast, it’s more likely that the person

(13)

will answer the following item fast as well, because the respondent has already learned how to solve this particular type of items. When implementing the Markov property probabilities for the classes are specified as followed;

𝑃(𝑍_𝑝𝑖 = 1) = 𝜋₀ (8)

𝑃(𝑍_𝑝𝑖 = 1|𝑍_𝑝𝑖 = 0) = 𝜋₁₀ (9)

𝑃(𝑍𝑝𝑖 = 1|𝑍𝑝𝑖 = 1) = 𝜋11 (10)

That is, there is a 𝜋₀ probability of the first response to be classified as slow (𝑍_𝑝𝑖= 1). The

probability of the consecutive responses to be classified as slow are dependent on the previous response. If the previous response is classified as fast (𝑍_𝑝𝑖 = 1|𝑍_𝑝𝑖 = 0) the probability of the

latter response to be classified as slow is 𝜋₁₀. If the previous response is classified as slow

(𝑍𝑝𝑖 = 1|𝑍𝑝𝑖= 1)the probability of the latter response to be classified as slow is 𝜋11.

Response accuracies are modelled as a 2PL IRT model as stated in Equation 1. The item parameters are class specific. The discrimination parameter is class specific and can differ across items, but that is not necessary.

Simulation

Methods

Data Simulation

To test the viability of the Response Mixture Model, the model was applied to various datasets. The data for the baseline design was simulated for 1000 subjects and 40 items. The true discrimination parameter were set as α = 1.5 for the responses in the fast class and α = 1 for the responses in the slow class. These values were chosen because various literature suggests slow responses discriminate better (Partchev & De Boeck, 2012; Molenaar, Bolsinova, Rosza & De Boeck, 2015). The choice was made to fix the discrimination

(14)

parameter across items, but let it differ across classes. The true difficulty and time intensity parameters were sampled from a normal distribution.

𝛽₀, 𝛽₁, 𝜉₀, 𝜉_{𝑑𝑖𝑓𝑓}~ 𝑀𝑉𝑁(𝜇_𝛽𝜉, 𝛴_𝛽𝜉) (11)

𝜉₁ = 𝜉₀+ 𝑒𝜉𝑑𝑖𝑓𝑓₍₁₂₎

Where 𝜇_𝛽𝜉 is a vector of the mean of consecutively the difficulty parameter for fast (𝜇_𝛽0 = 0)

and slow (𝜇𝛽1 = 1) responses and the time intensity parameter for fast responses (𝜇𝜉0 = 1)

and the difference between fast and slow responses (𝜇𝜉𝑑𝑖𝑓𝑓 = log(𝜎𝑖)). The 𝜎_𝑖 refers to the

residual response time and was set as 𝜎_𝑖 = 0.2. Eventually the time intensity for slow

responses (𝜉₁) was computed by adding the time intensity for the fast class and the exponent

to the power of the difference in time intensities between classes.

𝛽0 𝛽1 𝜉0 𝜉𝑑𝑖𝑓𝑓

𝛽0 1.0 0.5 0.5 0.000 𝛴_𝛽𝜉= 𝛽1 0.5 1.0 0.0 0.000

𝜉0 0.5 0.0 1.0 0.000

𝜉𝑑𝑖𝑓𝑓 0.0 0.0 0.0 0.001

The person parameter, speed and ability, were sampled from a multivariate normal distribution as well.

𝜃_𝑝𝜏_𝑝~(𝜇_𝜃𝜏, 𝛴_𝜃𝜏) (13)

Where 𝜇_𝜃𝜏 is vector of the mean of ability (θ) and speed (τ) which were both

set on a value of 0. The covariance matrix of speed and ability (𝛴_𝜃𝜏) was set so that

there was a correlation of .6 between both, a variance of speed of 𝜎_𝜏2 = .01 and a

(15)

Designs

The parameter values that differed across designs can be found in Table 2. The model was tested for a total of nine designs. In each design there was only one parameter changed. Three designs differed in the number of subjects (N) or number of items (n). The Small Number of Subjects and the Extra Small Number of Subjects designs are specified to test to

what extend the parameter estimations will converge to the true value when there are respectively 500 and 250 subjects. It is necessary to test the model for this design because a dataset with 1000 subjects is not always available in psychological measurements. Therefor it is necessary to have information about the viability of the model when there are less subjects. The Small Number of Items design is chosen to represent situations in which only a small number of items is presented to the participants, e.g. due to lack of time.

Two designs differed in transition probabilities, where one represented an instable transition probability condition and one represented the data without a Markov property. The Instable Transition design is proposed to represent situations where switching from slow to

fast and vice versa, is relatively easy. For instance if a test has a set of differing items, it is possible that switching from state becomes instable because no obvious learning occurs. The No Markov design is the extended version of this Instable Transition design, in which is

assumed that the classification is not at all dependent on the previous classification. For instance, this can be the case when the items in a test are totally different.

The other two designs suggested no difference in the difficulty and discrimination parameter. Non differing difficulty and discrimination parameters can occur when there are no (noticeable) differences in the underlying process of responding and the difference in

response times is only due to speed of certain cognitive functions. Since it is not proven yet that this is not the case, although unlikely, it is important to test whether the model is viable in

(16)

these situations as well. Therefore non differing difficulty and discrimination parameters are presented in the No Difficulty Difference and No Discrimination Difference designs.

The differing parameters all are chosen to investigate the limits of the model. It is expected that in each of the proposed designs the proportion of correct classified responses will be lower in comparison to the baseline design. This will most likely occur in the Small Time Intensity Diff design, because the 𝜉_{𝑑𝑖𝑓𝑓} parameter specifies the between the distribution

of the slow and the fast responses. Thus, when this difference is smaller, the differences in response times between the fast and slow classes will be smaller as well, which will lead to more difficulty in classifying the response.

Other designs will influence the parameter estimation as well, which will probably also lead to more misclassification of responses. For instance, in the designs where number of subjects is smaller than in the baseline design, it is expected that the estimates will converge to the true value but the variance and the MSE will increase when the number of subjects decreases. The same is expected for the Small Number of Items design.

Table 2

The parameters per design

Baseline Instable Transition No Markov Small Number of Subjects XS Number of Subjects Small Number of Items Small Time Intensity Diff No Difficulty Difference No Discrimination Difference 𝑁 1000 1000 1000 500 250 1000 1000 1000 1000 𝑛 40 40 40 40 40 20 40 40 40 𝛼0 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1 𝛼1 1 1 1 1 1 1 1 1 1 𝜇 𝛽0 0 0 0 0 0 0 0 0 0 𝜇 𝛽1 1 1 1 1 1 1 1 0 1 𝜇 𝜉0 1 1 1 1 1 1 1 1 1 𝜇 𝜉𝑑𝑖𝑓𝑓

log(σ) log(σ) log(σ) log(σ) log(σ) log(σ) log(0.5*σ) log(σ) log(σ) 𝜋11 0.85 0.6 0.5 0.85 0.85 0.85 0.85 0.85 0.85 𝜋10 0.15 0.4 0.5 0.15 0.15 0.15 0.15 0.15 0.15 𝜋0 0.5 0.5 0.7 0.5 0.5 0.5 0.5 0.5 0.5

Note: For each design only one parameter was changed in comparison to the baseline model. In the No Difficulty Difference design 𝛽0= 𝛽1

(17)

Estimation

For the estimation of the parameters a sampler was used. This sampling method uses a Markov Chain Monte Carlo algorithm to sample from a multivariate distribution when direct sampling is not possible. The parameters are subsequently sampled from their conditional posterior distribution, conditioned on all the current values of all other parameters and the data. This step is repeated through multiple iteration. Since the joint distribution of the parameters is proportional to the product of the data density and the prior distribution, the joint data density is given by;

ℎ(𝑋_𝑝, 𝑇_𝑝) = ∑_𝑍_𝑝1=0 . . ∑ 𝑓(𝑿_𝒑, 𝑻_𝒑|𝒁_𝒑) ∗ Pr(𝑍_𝑝1) ∗ ∏𝑛=40Pr (𝑍_𝑝𝑖|𝑍_{𝑝(𝑖−1)}) i=2

𝑍𝑝𝑛=0 (14)

That is, the joint data density is the distribution of the vectors of the responses of person p (𝑋_𝑝)and the response times of person p (𝑇_𝑝) conditioned on the vector of the classifications

of person p on all items (𝑍_𝑝), multiplied with the probability of a certain classification of

person p on item 1 (Pr (𝑍𝑝1)) and multiplied with the transition probability given the previous

classification of person p (Pr (𝑍_𝑝𝑖|𝑍_{𝑝(𝑖−1)}).

Below the priors for all the person parameters are expanded.

𝜃𝑝, 𝜏𝑝~ 𝑀𝑉𝑁(𝜇𝑃, 𝛴𝑃) (15)

Where

𝜇_𝑃 = [ 0, 0 ] (16)

𝛴_𝑃 = 1 𝜌_𝜃𝜏∗ 𝜎_𝜏

(18)

That is, the prior distribution of ability and speed 𝜃_𝑝, 𝜏_𝑝 of person p is a multivariate normal

distribution with a mean vector of the person parameters 𝜇_𝑃 and a covariance matrix of person

parameters 𝛴𝑃. This with, 𝜌𝜃𝜏 as the correlation between speed and ability, 𝜎𝜏 as the standard deviation of speed and 𝜎𝜏2 as the variance of speed.

For each parameter specified in the covariance matrix a prior should be specified as well.

𝜌𝜃𝜏~𝑈(−1,1) (17)

𝑓(𝜎_𝜏) ∝ 1

𝜎𝜏 (18)

That is, the prior of the correlation between speed and ability follows a uniform distribution with interval [-1,1] and the prior of the standard deviation of speed 𝜎_𝜏 is proportional to 1

divided by the starting value of the standard deviation. This means that the prior of the

standard deviation of speed is an improper prior. An improper prior is one that is not specified as a distribution but rather as a value. Such a prior can be used when there is enough data for the prior to be updated to a posterior distribution.

The priors for the discrimination parameters are also improper.

𝑓(𝛼0) ∝ 1

𝛼0 (19)

𝑓(𝛼₁) ∝ 1

𝛼1 (20)

For the other item parameters a conjugate prior is assumed. A conjugate prior is used when the prior and the posterior are from the same family of distributions. In other words, when we use a multivariate normal distribution as a prior, the posterior distribution will also follow a multivariate normal distribution.

(19)

With priors for 𝜇_𝐼 and 𝛴_𝐼 are as followed:

𝜇_𝐼 ~ 𝑀𝑉𝑁(𝜇_𝜇_𝐼, Σ_𝜇_𝐼) (22)

𝛴𝐼~𝑊(6, 𝐼) (23)

That is, the the mean vector for item parameters 𝜇𝐼 follows a multivariate distribution with mean vector of 𝜇_𝜇_𝐼with values all equal to zero and a covariance matrix (𝛴_𝜇_𝐼) with all variance values equal to 100. The covariance matrix for item parameters (𝛴𝐼) follows a Wishart distribution with six degrees of freedom and an identity matrix as scale parameter. For the classification parameters a prior is specified for the transition probabilities and for the classification of a response. 𝜋₀~ 𝐵(1,1) (24)

𝜋₀~ 𝐵(1,1) (25)

𝜋₀~ 𝐵(1,1) (26)

𝑍_𝑝1~ 𝐵𝑖𝑛𝑜𝑚(𝜋₀) (27)

That is, the prior for the starting probability (𝜋₀) and the transition probabilities (𝜋₁₀, 𝜋₁₁) is

a Beta distribution with interval [1,1]. The prior for a classification of the first response (𝑍_𝑝1)

is a binomial distribution with 𝜋₀ as the probability.

As mentioned previously, the full posterior distribution is proportional to the product of the data density and the priors. Since conditional independence is assumed between speed and ability, a measurement model can be specified for both separately. This means that the posterior distribution of the parameters is proportional to the product of the prior distribution of a parameter and the related measurement model. Thus, the full posterior distribution for

(20)

each parameters does not have to consist of the particular parameter conditioned on all the other parameters. To simplify the conditional posterior, data augmentation is used. For each response a binary auxillary variable 𝑍_𝑝𝑖 is introduced and sampled with all the other

parameters. The prior distribution for 𝑍_𝑝 is as followed;

𝑍_𝑝~ 𝐵𝑖𝑛𝑜𝑚(𝑍_𝑝1, 𝜋_𝑜) ∏𝑛_𝑖=2 𝐵𝑖𝑛𝑜𝑚(𝑍_𝑝𝑖, 𝜋₁₀(1 − 𝑍_{𝑝(𝑖−1)}+ 𝜋₁₁𝑍_{𝑝(𝑖−1)}) (28)

For instance, when sampling the ability for person p it is not necessary to condition on all the other person parameters, the time intensity parameters, data which is not from person p or the response times of person p.

𝜃𝑝~ 𝑓(𝜃𝑝|𝜏𝑝, 𝜎𝜏2, 𝜌𝜃𝜏, 𝜷𝒁𝟏, 𝜷𝒁𝟎, 𝜶, 𝑿𝒑, 𝒁𝒑) (29)

That is, the estimation of the ability parameter for person p is only influenced by the speed parameter for person p (𝜏_𝑝), the variance of the speed parameter (𝜎_𝜏2_{), the correlation}

between speed and ability (𝜌_𝜃𝜏), the vectors of the item difficulties in the slow and fast

classes (𝜷𝒁𝟏,, 𝜷𝒁𝟎), the vector of the discrimination parameter in the slow and fast class (𝜶),

a vector of all the responses to the items of person p (𝑿_𝒑) and a vector when conditioned on

the classification of all the responses for person p (𝒁𝒑).

The estimation was partially done in R and partially in C. For each design 50

replications were conducted. The parameter values were sampled with the Gibb’s Sampling algorithm during 10 000 iterations (10 in C and 1000 in R). The R code can be found in Appendix A.

Results

In this subsection first the parameter estimations will be discussed. For The Baseline design the plots of the true value of the difficulty and time intensity parameter is plotted against the posterior mean of the estimated values, with a 95% confidence interval (Figure 2).

(21)

In Figure 3 the distribution of the sampled values of the fast and the slow discrimination parameter is plotted against the true value of the discrimination parameter in The Baseline design. The bias and variance for all the designs is expanded in Table 3a-b. Secondly the performance of the model for classifying the responses will be discussed. In Table 4 the proportion of correct classified responses is expanded for all the designs.

Parameter Estimation Figure 2.

Plots of Beta and Xi in Fast and Slow Classes in the Baseline Design.

Note: These plots show the average of the estimated values plotted against the true value

As can be seen in Figure 2, the parameter estimates converged to the true value. The true value of time intensity was within the 95% CI for both the slow and fast parameters. The true value of difficulty was for almost every item within the 95% CI. As is shown, the CI for the time intensity are really small in comparison to those for the difficulty parameters, this is due to a small variance in the estimates of the time intensity parameter

(22)

Figure 3.

The Distribution of Alfa Parameters in Fast and Slow Classes in the Baseline Design.

In Figure 3 the distribution of the sampled values of the fast and the slow discrimination parameter is plotted against the true value of the discrimination parameter in design 1. As can be seen by the position of the distribution in relation to the true value, the estimated values are slightly biased. A negativ bias can be found for the fast discrimination parameter and a

positive bias for the slow discrimination parameters.

Table 3a

The bias of the Item Parameters per Design Baseline Instable Transition No Markov Small Number of Subjects XS Number of Subjects Small Number of Items Small difference between ξ No Difficulty Difference No Discriminatio n Difference 𝜶𝟏 0.064 0.065 0.042 0.049 0.047 0.038 0.049 0.029 0.015 𝜶𝟎 0.073 0.008 0.025 0.088 0.022 0.113 0.033 0.473 0.052 𝜷𝟏 0.02539 0.0385 0.0967 0.0686 0.0765 0.0907 0.013 0.2403 0.00338 𝜷𝟎 0.056 0.039 0.001 0.068 0.103 0.497 0.040 0.009 0.00023 𝝃𝟏 0.02244 0.0327 0.04782 0.04897 0.00171 0.06676 0.247 0.00514 0.00343 𝝃𝟎 0.010 0.019 0.004 0.029 0.002 0.396 0.009 0.001 0.009

Note: The bias was calculated by substracting the mean of the estimated values per item (over replications) from the true value of the parameter for each item. The expanded value is the average of the absolute bias per parameter.

(23)

The biases for all the parameters in all designs were relatively small. Expected was a

relatively small bias for all the parameters in the Baseline design, this result is found, with the exception of the even smaller biases in the No Discrimination Difference design. The

relatively larger biases are found in the Small Number of Items design. The parameter

estimates for all the parameters did have a relatively large bias. This might be due to the way in which the true parameters were simulated. For the Small Difficulty Difference design the bias time intensity parameter for the slow class was relatively large. The bias for the difficulty parameter in the slow class was found as relatively large in the No Difficulty Difference design. In this design the discrimination parameter for the fast class was large as well. Based on the biases, the parameters in the Small Number of Subjects and the Extra Small Number of Subjects designs were quite good. It is expected, though, that the variance will be relatively

large for these designs.

Table 3b

The variance of the Item Parameters per Design Baseline Instable Transition No Markov Small Number of Subjects XS Number of Subjects Small Number of Items Small difference between ξ No Difficulty Difference No Discrimination Difference 𝜶𝟏 0.0027 0.0053 0.0029 0.0037 0.1033 0.0230 0.0968 0.00024 0.0004 𝜶𝟎 0.0034 0.0012 0.0028 0.0010 0.0313 0.059 0.1436 0.0002 0.0012 𝜷𝟏 0.0042 0.0012 0.0064 0.0290 0.1429 0.040 0.4069 0.0004 0.0001 𝜷𝟎 0.0044 0.0035 0.0027 0.0628 0.2049 0.0456 0.0119 0.0003 0.2100 𝝃𝟏 0.0147 0.0011 0.0017 0.0577 0.1909 0.0667 0.0001 0.0001 0.1390 𝝃𝟎 0.0058 0.0021 0.0015 0.0483 0.0197 0.0452 0.0002 0.210 0.0001

(24)

As expected the variances of the parameters in the Small Number of Subjects, Extra Small Number of Subjects and Small Number of Items are relatively large.

Classification of Responses

Table 4

Proportion Correct Classified Responses in Each Designn Baseline Instable Transition No Markov Small Number of Subjects XS Number of Subjects Small Number of Items Small difference between ξ No Difficulty Difference No Discriminatio n Difference Prop. Correct Classified 0.81 0.72 0.72 0.78 0.78 0.73 0.71 0.78 0.74

As expected the model performed best in the baseline condition, where a proportion of correct classified responses of .79 occurred. In Table 4 the proportion correctly specified responses is displayed for each design. It appeared that the model is not much influenced by a small sample size. That is, the proportion of correct classified responses did not differ much between the conditions where a small sample size or an extra small sample size was

suggested, although the variance of the estimates was large. The model performed the worst when a small difference between the intensity parameter was suggested. Since the model proposes two different distribution for slow and fast responses and the space between these two distribution is defined by the difference between time intensity, the relatively small proportion correct classified responses was expected. The model performed relatively bad in the Small Number of Items design as well, this was expected when the parameter estimates were taken in to consideration.

Discussion

The RMM model has proven to perform well in classifying slow and fast responses. It was shown in the simulation study that the model performed best when there were was a large number of subjects and items available and a large difference between the slow and fast time

(25)

intensity parameters. Other important factors seemed to be the transition probabilities. For the model to perform best there should be stable transition probabilities in the latent Markov property of the response classes. The two most important factors seemed to be the difference between time intensity parameters and the number of items. Less important factors seemed to be the non-differing difficulty and discrimination parameters. In the designs were those parameters were set, the model performed relatively well.

Discussion

The Response Mixture Model (Molenaar, Obserki, Vermunt & De Boeck, 2016), has proven to perform well in differentiating between slow and fast responses on various designs, based on the response time and the response accuracy. As stated previously, this might be due to differences in cognitive processes preceding the response. Fast responses may arise from faster processes like full information retrieval and slow responses may arise from slower processes like partial information retrieval and partial computing in working memory (Partchev, & De Boeck, 2012).

Information on processes underlying responses is not only from great value for making inferences on individual level or for making comparisons between respondents. Having insight on the type of cognitive process that led to the particular response can also give more information on the relation between cognitive processes and the psychometric measurement of ability. This is a fundamental part of detecting validity as proposed by Borsboom (2006). In Borsboom’s causal validity it is argued that one can only speak of a valid test when it is

certain that a) the latent variable exists and b) variation in the latent variable will lead to variation in the observed variable. He suggests that detecting validity must be done by

investigating the underlying process of response, to assure that the latent variable has a causal influence on the observed variable.

(26)

However, for one to speak of causation, there are certain conditions that must be met. David Hume (Strawson, 2014) specified three conditions that are necessary for the existence of a causal relationship. First for event x to have a causal influence on event y, event x and event y should be near to each other in space and time. Secondly event x should precede event y. And finally, event x is necessary for event y to exist. In other words, if and only if event x happens, event y happens.

This fact leads to a problem in what is measured by an item (i.e. what the latent variable or ability is) and if that what is measured has a causal influence on the observed (i.e. score on an item). As stated by Molenaar, Maris, Kievit and Borsbom (2012), a test is said to measure an ability if it requires some of the ability. When taking a particular item in

consideration, from which is known that responses can be slow or fast, the fast response process relies on a different set of cognitive skills than the slow response process (Partchev & De Boeck, 2012). In other words, the product of cognitive functioning might be the same, if accuracy is the same, but the process preceding the response may differ. And thus, an item that is answered slowly (and correct) may require a different (part of an) ability than an item that is answered fast (and correct). Bluntly, this would mean that an item in which

respondents can be fast or slow may not measure the same cognitive functioning or ability in all situations.

If there are more than one ability/parts of abilities that influence the observed score, one can not imply causality between the ability and the observed score. When taking the third condition for causality to exist (Strawson, 2014) in to account, one can only imply causality if and only if the ability causes the observed score. Since there are different mechanisms

involved in the ability and they might differ across persons, it is not possible to know what the cause is of the observed score. A person might answer an item correctly due to a great

(27)

competence in memorizing things or due to the fact that he computed the item fully in working memory.

In physics they come across these type of problems as well. For example, consider an antique thermometer, one that is not yet air pressure independent. High up in the Mountains of the Himalaya the thermometer gives a value of -15 degrees Celcius. In the Dutch polders the thermometer also gives a value of -15 degrees in celcius. According to the thermometer it seems as if the temperature is the same in both places, but what we have to take in account is the difference of pressure between the two places. The mercury molecules in the thermometer react to increase or decrease in temperature, but also to increase or decrease in air pressure. Since the air pressure is much lower in the high Himalayas, the value the thermometer is showing, doesn’t represent the same temperature. This example illustrates the importance of

knowing the process involved in response, namely if the process among situations differs, it’s likely that the product doesn’t represent the same latent variable. Which means one observed

variable can be caused by multiple latent variables and thus one cannot assume causality between one of the latent variables and the observed variable and therefor one cannot say the measurement instrument is valid.

The physicist Il De Medici (1654) solved this problem by inventing an air pressure independent thermometer. Assuming the mercury molecules in the thermometer are only affected by pressure and temperature, this operation led to the isolated measurement of

temperature. The only variable that influenced the mercury now was temperature, because the air pressure was kept constant. And so, one could assume a causal influence from temperature on the mercury molecules and conclude the thermometer was a valid measurement

instrument.

Since controlling tests in a way that they measure only one isolated process involved is not (yet?) possible, one has to find another way to account for the possible differences in

(28)

underlying (response)processes and that what is measured. Since in IRT a response is influenced by item parameters and person parameters, changing the item parameters for the differing states might correct for the differing underlying processes between the states.

Therfor, having class specific parameters and different measurement models for each process, to account for the differing processes seems to be a step in the right direction.

References

Bolsinova, M., & Tijmstra, J. (2015). Can Response Speed Be Fixed Experimentally, and Does This Lead to Unconfounded Measurement of Ability?. Measurement: Interdisciplinary Research and

Perspectives, 13(3-4), 165-168.

De Boeck, P., & Partchev, I. (2012). IRTrees: Tree-based item response models of the GLMM family. Journal of Statistical Software, 48(1), 1-28.

De Ayala, R. J. (2013). The theory and practice of item response theory. Guilford Publications.

DeMars, C. (2010). Item response theory. Oxford University Press, USA.

Fan, X. (1998). Item response theory and classical test theory: An empirical comparison of their item/person statistics. Educational and psychological measurement, 58(3), 357-381.

Fitts, P. M. (1954). The information capacity of the human motor system in controlling the amplitude of movement. Journal of experimental psychology,47(6), 381.

Goldhammer, F. (2015). Measuring Ability, Speed, or Both? Challenges, Psychometric Solutions, and What Can Be Gained from Experimental Control. Measurement: Interdisciplinary Research

and Perspectives, 13(3-4), 133-164.

Levine, G., Preddy, D., & Thorndike, R. L. (1987). Speed of information processing and level of cognitive ability. Personality and Individual Differences, 8(5), 599-607.

van der Linden, W. J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72(3), 287-308.

(29)

Van Der Linden, W. J. (2009). Conceptual issues in response‐time modeling.Journal of Educational

Measurement, 46(3), 247-272.

van der Maas, H. L., & Jansen, B. R. (2003). What response times tell of children’s behavior on the balance scale task. Journal of Experimental Child Psychology, 85(2), 141-177.

Maris, G., & Van der Maas, H. (2012). Speed-accuracy response models: Scoring rules based on response time and accuracy. Psychometrika, 77(4), 615-633.

Molenaar, D. (2015). The Value of Response Times in Item Response Modeling. Measurement:

Interdisciplinary Research and Perspectives, 13(3-4), 177-181.

Molenaar, D., Bolsinova, M., Rozsa, X., De Boeck, P. (in press) ??

Molenaar, D., Oberski, X., Vermunt, X., De Boeck, P. (in press) ??

Molenaar, D., Tuerlinckx, F., & Maas, H. L. J. (2015). Fitting diffusion item response theory models for responses and response times using the R package diffIRT. Journal of Statistical

Software, 66(4).

Partchev, I., & De Boeck, P. (2012). Can fast and slow intelligence be differentiated?. Intelligence, 40(1), 23-32.

Roskam, E. E. (1987). Toward a psychometric theory of intelligence.Progress in mathematical

psychology, 1, 151-174.

Strawson, G. (2014). The Secret Connexion: Causation, Realism, and David Hume: Revised Edition. OUP Oxford.

Verhelst, N. D., Verstralen, H. H., & Jansen, M. G. H. (1997). A logistic model for time-limit tests (pp. 169-185). Springer New York.

Thorndike, E. L., Bregman, E. O., Cobb, M. V., & Woodyard, E. (1926). The measurement of intelligence.

(30)

Appendix A

Time1<- Sys.time() set.seed(Time1)

#Functions used in sampling

sample_mu=function(x,s){ # function for sampling the mean vector of the item parameters

k=ncol(x) m0=rep(0,k)

s0=matrix(c(rep(c(100,rep(0,k)),k-1),100),ncol=k)

m=mvrnorm(1,solve(solve(s0)+nrow(x)*solve(s))%*%(solve(s0)%*%m0+nrow(x)*solve(s)%*%c olMeans(x)),solve(solve(s0)+nrow(x)*solve(s)))

return(m) }

sample_S=function (theta,mu) # function for sampling the covariance matrix of the item parameters

{

nS = ncol(theta) ## number of dimensions

nP = nrow(theta) ## sample size

mu = matrix(mu, ncol = 1, nrow = nS)

S = matrix(rep(0, nS^2), ncol = nS, nrow = nS) ### sum of squares matrix

for (i in 1:nP) {

x = matrix(theta[i, ], ncol = 1, nrow = nS) x = x - mu

S = S + x %*% t(x) }

s = matrix(c(rep(c(1, rep(0, nS)), (nS - 1)), 1), nS, nS) S = S + s

S = solve(S)

Z = mvrnorm(n = (nP + nS + 2), mu = c(rep(0, nS)), Sigma = S) ### sampling from the inverse-Wishart (see Gelman et al. 1995)

T = matrix(rep(0, nS^2), nS, nS) for (i in 1:(nP + nS + 2)) {

z = matrix(Z[i, ], ncol = 1, nrow = nS) T = T + z %*% t(z)

}

T = solve(T) return(T)

} # output: new covariance matrix

setwd("C:/Users/Zenab/Desktop/GST")

dyn.load('FS.dll') # loading C-code. It should be located in your working directory.

for (D in 7:7){

#_____________________DATA GENERATING PART 1: PARAMETERS #Residiual

res.sigma.true<- 0.2 #Design Matrix

DM<-matrix(c(1000, 1000, 1000, 500, 250, 1000, 1000, 1000, 1000, 1000, 40, 40, 40, 40, 40, 20, 40, 40, 40, 40,

(31)

1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1, 1, 1.5, 1, 1, 1, 1, 1, 1, 1, 1, 1.5, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, log(res.sigma.true), log(res.sigma.true), log(res.sigma.true),

log(res.sigma.true), log(res.sigma.true), log(res.sigma.true),

log(0.5*res.sigma.true), log(res.sigma.true), log(res.sigma.true), log(res.sigma.true), 0.85, 0.6, 0.5, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.85, 0.15, 0.4, 0.5, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.5, 0.5, 0.7, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5 ), 11, 10, byrow=T) # row: #1;samplesize, 2;n items, #3;alfafast, 4;alfaslow, #5;betafast 6;betaslow #7;xifast 8;xidiff, #9;pi11, 10;pi10, 11;pi1 #sample size

N= DM[1,D]

#number of items n= DM[2,D]

#values of alfa parameters

alfa.f.true<- DM[3,D] alfa.s.true<- DM[4,D]

#xi and beta from MVN

library(MASS)

covmat_xi_beta <- matrix(c(1,0.5,0.5,0,0.5,1,0,0,0.5,0,1,0,0,0,0,0.001),4,4,byrow=T) mu_vect_xi_beta <- c( DM[5:8,D])

XB <-mvrnorm(n=n, mu = mu_vect_xi_beta, Sigma=covmat_xi_beta, empirical=T) beta.f.true<- XB[,1]

beta.s.true<- XB[,2] xi.f.true<- XB[,3] xi.diff.true<- XB[,4]

xi.s.true<- xi.f.true+exp(xi.diff.true)

#Instable transition probabilities

pi11<- DM[9,D] pi10<- DM[10,D]

#starting probability

pistart.true <- DM[11,D]

#matrix with tarting values of class

C<-matrix(NA,N,n)

C[,1]<-rbinom(N,1,pistart.true)

#Computing classes per person per item as a Markov chain 1=S 0= F

for (j in 1:N){ for (i in 1:(n-1)){ if(C[j,i]==0){

C[j,i+1]<- rbinom(1,1,pi10) } else if (C[j,i]==1){

(32)

C[j,i+1]<-rbinom(1,1,pi11) }

}} Z.true<-C

#Beta parameters per person per item dependent on state

B<- matrix(NA,N,n)

for (j in 1:N){ for (i in 1:n){ if (C[j,i]==1){

B[j,i]<-beta.s.true[i]

} else {B[j,i]<-beta.f.true[i]} }

}

#Alfa paramters per person per item dependent on state

A<- matrix(NA,N,n)

for (j in 1:N){ for (i in 1:n){ if (C[j,i]==1){ A[j,i]<-alfa.s.true

} else {A[j,i]<-alfa.f.true} }

}

#Xi per person per item dependent ons state

XI<- matrix(NA,N,n) for (j in 1:N){ for (i in 1:n){ if (C[j,i]==1){ XI[j,i]<- xi.s.true[i] } else if (C[j,i]==0){ XI[j,i] <- xi.f.true[i] } } }

#Theta and tau parameters per person with correlation of ~0.6

library(MASS)

pp<-mvrnorm(n=1000,mu=c(0,0),Sigma=matrix(c(1,0.20,0.20,0.1),2,2, byrow=T)) theta.true<- pp[,1]

tau.true<-pp[,2]

#tau as a matrix

taum<- matrix(tau.true,N,n,byrow=F)

#Probability correct response according to 2pl IRT with Markov property

P<- matrix(NA,N,n)

for (j in 1:N){ for (i in 1:n){

P[j,i]<- (exp(A[j,i]*(theta.true[j]-B[j,i])))/(1+exp(A[j,i]*(theta.true[j]

-B[j,i]))) } }

#Start FORLOOP for Sampling from HERE.

(33)

#_____________________DATA GENERATING PART 2: DATA

#Response Matrix X according to 2pl IRT with Markov property

X<- matrix(nrow=N,ncol=n)

for (j in 1:N){ for (i in 1:n){

X[j,i]<- rbinom(n=1,size=1,prob=P[j,i]) }

}

#ResponseTime Matrix T as Xi-Tau T<- matrix(NA,N,n)

for(j in 1:N){ for (i in 1:n){

T[j,i]<-rnorm(1,XI[j,i]-taum[j,i],res.sigma.true) }

}

#SAMPLING CONTINUES

# first starting values are generated for all the parameters Z=matrix(rbinom(N*n,1,0.5),N,n)

rho=0

st=0.5 # standard deviation of speed

t=mvrnorm(N,c(0,0),matrix(c(1,rho*st,rho*st,st^2),2,2)) # first column is theta, second column is tau

rho=cor(t[,1],t[,2])

st=sd(t[,2])

b=cbind(rnorm(n,0,2),rnorm(n,0,2)) # matrix fast and slow beta (as difficulty fast/slow)

xi=cbind(rnorm(n,3,0.5),rlnorm(n,-1,0.2)) # matrix fast xi and the difference between slow and fast xi

m=colMeans(cbind(b,xi[,1],log(xi[,2]))) # mean vector of the item parameters s.i=cov(cbind(b,xi[,1],log(xi[,2]))) # covariance matrix of the item parameters si=rep(0.1,n) # residual variance of the logRT

a=c(1,1) # discrimination slow and discrimination fast pi0=0.5 # starting probability

piTrans=c(0.5,0.5) # transition probability from 0 to 1, and from 1 to 1 # ss and s.cond are re-structured elements of the item covariance matrix ss=NULL

for(k in 1:ncol(s.i)){ss=c(ss, s.i[k,-k]%*%solve(s.i[-k,-k]))}

s.cond=NULL

for(k in 1:ncol(s.i)){

s.cond=c(s.cond,s.i[k,k]-s.i[k,-k]%*%solve(s.i[-k,-k])%*%s.i[-k,k]) }

# creating objects to store the sample values of A=NULL # discriminations

PI=NULL # probabilities

Xi=NULL # fast xi and the difference between slow and fast xi

Si=NULL # residual variances of logRT B=NULL # fast and slow difficulties Theta=NULL # abilities

Tau=NULL # speed ZZ=NULL #

(34)

Rho=NULL St=NULL

for(It in 1:1000){

# inside of the c-routine thetas, taus, alphas, betas, xis, sigmas, pis and Zs

are sampled

tmp

<-.C('Gibbs',as.integer(X),as.integer(Z),as.double(T),as.double(t),as.double(a),as.dou ble(b),as.double(xi),as.double(si),as.integer(N),as.integer(n),as.double(m),as.doubl e(ss),as.double(s.cond),as.double(pi0),as.double(piTrans),as.double(rho),as.double(s t))

# all these objects are vectors

Z=tmp[[2]] # vector with first all class memberships of the responses to item 1, then all memberships of the response to item 2 and so on

t=tmp[[4]] # vector with first all theta's, then all tau's

a=tmp[[5]] b=tmp[[6]] xi=tmp[[7]] si=tmp[[8]] pi0=tmp[[14]] piTrans=tmp[[15]] rho=tmp[[16]] st=tmp[[17]]

# below the mean vector and the covariance matrix of the item parameters are

sampled

m=sample_mu(cbind(b[1:n],b[(n+1):(2*n)],xi[1:n],log(xi[(n+1):(2*n)])),s.i) s.i=sample_S(cbind(b[1:n],b[(n+1):(2*n)],xi[1:n],log(xi[(n+1):(2*n)])),m) ss=NULL

for(k in 1:ncol(s.i)){ss=c(ss, s.i[k,-k]%*%solve(s.i[-k,-k]))} s.cond=NULL

for(k in 1:ncol(s.i)){

s.cond=c(s.cond,s.i[k,k]-s.i[k,-k]%*%solve(s.i[-k,-k])%*%s.i[-k,k]) }

# saving the sampled values

if(It>500){ # after the burn in (#iter/2)

A=cbind(A,a) B=cbind(B,b) Xi=cbind(Xi,xi) Si=cbind(Si,si) PI=cbind(PI,c(pi0,piTrans)) Theta=cbind(Theta,t[1:N]) Tau=cbind(Tau,t[(N+1):(2*N)]) Rho=c(Rho,rho) St=c(St,st)

ZZ=cbind(ZZ,Z) # First 1000 rows are the classification of responses to item 1, etc.?

} }