Modeling response times : distinguishing fast and slow strategies

(1)

Jessica Vera Schaaf

University of Amsterdam

Modeling Response Times

Since the time computerized testing gained popularity, response times have become available in addition to responses (Molenaar, Oberski, Vermunt, & De Boeck, in press). These response times are assumed to represent the time it took for a process to start, develop, and end (Molenaar, 2015). Up till now, research has focused on how to incorporate this additional source of information concerning individual differences in the existing measurement models (Molenaar et al., in press). The most common way to incorporate a measurement model for response times into the measurement model of response accuracy is the hierarchical generalized linear modeling framework. This hierarchical framework has two lower-level models as well as two higher-level models. Whereas the two lower-level models have ability and speed parameters for the test takers as well as difficulty and time intensity parameters for the items, the two higher-level models are for the distributions of the person parameters in the population of test takers and the item parameters in the domain of test items (Van der Linden, 2009). The hierarchical generalized linear modeling framework thus entangles the response time and the actual response by introducing model components for these two variables for both persons and items. The development of the hierarchical framework was the first step in developing joint models for accuracy and response time and has been successfully used in several applications in educational and psychological testing (Van der Linden, 2008; Van der Linden & Guo, 2008; Klein Entink, Kuhn, Hornke, & Fox, 2009; Goldhammer & Klein Entink, 2011; Loeys, Rossel, & Baten, 2011; Petscher, Mitchell, & Foorman, 2014; Scherer, Greiff, & Hautamäki, 2015).

However, this hierarchical generalized linear modeling framework does not allow for some research questions to be answered. This is because this framework focuses solely on inter-individual differences. Using the hierarchical framework one can specify differences between respondents regarding response accuracy and response time conditional on ability and speed, but one can not determine how certain within-person processes take place. In psychological research, these within-person processes, from which responses are seen as the result (Molenaar, 2015), form a main point of interest. Therefore, an intra-individual part should be incorporated in a model that is used to explain human response processes.

The last couple of years, the field that develops these intra-individual models has bloomed. Multiple alternative models have become available that incorporate response accuracy and response time on an intra-individual level. As a result several hypotheses are put forward in the literature. For example, as Molenaar (2015) argues, differences in response strategy might be filtered through differences in response time. Where fast strategies will result in small response times, slow, more time consuming strategies will result in larger response times. It might even be the case that these different strategies are a result of different underlying abilities (Partchev & De Boeck, 2012). Consequently we need to differentiate between the accuracy of fast responses, which are based on more automatic direct-link mediated processing (Shiffrin & Schneider, 1977), and the accuracy of slow responses, which are based on repeating one’s cognitive work and/or more controlled processing (Shiffrin & Schneider, 1977). We should do this instead of just measuring speed and accuracy and instead of focusing solely on inter-individual differences. Although this might sound plausible, it is not as easy as one might think to implement the distinction between fast and slow responses. Where do you draw the line? What makes a response fast? Should you incorporate the overall speed level of a person? Or should you be even more specific and look at every single person at every single item to make inferences about the speed of the response?

In the following paper four joint models will be discussed and compared. First two models will be presented where the distinction between fast and slow responses is made using a median split of response times. It is argued that this is not the best way to classify responses. Therefore, two other methods of classification will be discussed. These methods are the Response Time Based Predictive Class Model, which uses residual response time as a predictor of response class membership, (Molenaar, Bolsinova, Rozsa, & De Boeck, 2016) and The Mixture Model, which utilizes two distributions, one for the slow class and one for the fast class, to determine which class membership is more likely. For this last model it is argued that it is the most plausible and therefore, at the end of the paper, a simulation study using the Mixture Model is added to investigate parameter recovery.

(2)

Classification using a Median Split

Lets start by stating a convenient way of distinguishing fast and slow responses. Although the notion of a difference between fast and slow responses was already present, not a lot of attention was paid to how to distinguish these responses. Partchev and De Boeck (2012) wondered if larger response times mean more of the same processes are executed or that larger response times imply a different kind of processing. They hypothesized that differences between fast and slow that go beyond differences in overall level of difficulty imply more than just going faster and slower through the same processes. In other words, if the item difficulties would not capture the differences in response times (i.e. not all slower responses have a higher difficulty) there must be something else going on than just speeding up or slowing down the same process. To explain this phenomenon, Partchev and De Boeck (2012) include a model in their application that houses two different abilities as well as two different sets of difficulties for fast and slow responses. To test their theory Partchev and De Boeck (2012) also fitted several constrained models. These models contain a model in which the fast and slow ability did not differ (θf =

θs) and a model in which there is only one set of difficulties

for fast and slow responses (βf = βs).

In this model, as well as in the following three models, the probability of a correct response was determined using a two-parameter logistic model:

f(Xpi| Zpi, θp, α, βi)= exp(αsθp−βis)Xpi 1+ exp(αsθp−βis)Xpi !Zpi + exp(αfθp−βi f)Xpi 1+ exp(αfθp−βi f)Xpi !1−Zpi (1)

where Xpi is the response of person p on item i, Zpi is the

class of that person on item i, θp the ability of person p,

α is the discrimination vector containing the discrimination in either the fast (αf) or the slow (αs) class and βi is the

difficulty vector containing the difficulty of item i in either the fast (βi f) or the slow (βis) class. Note that two different

discrimination parameters are chosen for the fast and the slow class, but that the discrimination across items within a class is equal. This is done to simplify the model estimation. Furthermore, one of the necessary things to do for Partchev and De Boeck to test their theory, is to make a distinction between fast and slow responses. If the responses are not properly separated, there is no way to support a theory that two different abilities exist. What Partchev and De Boeck (2012) did is splitting the response times in two equally sized groups. They took the response time of each person on each item and drew a line at the median of response time. The class Zpiis then derived in the following way:

Zpi=

(1 i f Tpi≥ median

0 i f Tpi< median

(2)

where Zpiis the class of the response of person p on item i,

with Zpi= 1 indicating a slow response and Zpi= 0 indicating

a fast response, and Tpiis the response time of person p on

item i.

The median split can be done in two ways. One of them is the intra-individual median split. In this method, a fast response is a response that belongs to the fastest half of responses of the person in question (Partchev & De Boeck, 2012). So in this case, all response times of a single person are ordered from fastest to slowest and then class per response is determined by a median split. As a consequence every person has an equal amount of fast responses as slow responses. You now assume that every person is equal regarding speed as a consequence of assuming that every person is the same regarding their preference for the fast or the slow class. Probably this is not the case in real life. Some persons might be fast at all items, some might respond to most of the items slowly.

To avoid this, another type of median split can be used. This second method is called an inter-individual median split. Now, a fast response is a response that belongs to the fastest half of responses to the item in question (Partchev & De Boeck, 2012). So in this case all responses on an item are ordered from fastest to slowest and then the median of these response times is taken as a cutoff point. Since the median of response times on a single item is used, it is now assumed that every item has an equal amount of fast respondents as slow respondents. As was the case with the intra-individual median split, this assumption might not hold in practice. Some items might be answered quickly by all participants or the other way around.

As Partchev and De Boeck (2012) say for themselves, the differentiation between fast and slow responses through a median value is rather arbitrary. In this way the response times are treated as discrete where in reality response time is of course gradual. Also it is assumed that the subjects use the fast category as much as the slow category (intra-individual median split) or it is assumed that for each item the fast category is used as much as the slow category (inter-individual median split) (Molenaar et al., in press). Probably, this is not the case in most applications. There will be people that only use one of the strategies and items that will be answered solely in one of the modes. Therefore, other models for classifying responses as either fast or slow will be given in the next section.

Classification using a Double Median Split A first attempt to classify responses in a better way could be a double median split. In this method one does not choose between an intra-individual or an inter-individual median split, but makes a distinction between fast and slow responses based on the response of a single person on a single item. Now both person and item specific characteristics are

(3)

taken into account. The class is determined by comparing the expected response time of person p on item i with the observed counterpart. This comparison is captured in the residual response time:

pi= lnTpi− (ξi−τp) (3)

with

pi∼ N(0, σ2i) (4)

where pi is the residual response time of the response of

person p on item i, lnTpiis the log-transformed response time

of person p on item i, ξiis the time intensity of item i and τp

is the speed of person p. In addition, piis assumed to come

from a normal distribution with a mean of zero and a variance equal to the residual variance (σ2_i).

Here the log-transformed observed response time is lnTpi

and the expected response time is ξi −τp. The residual

response time illustrates how much measurement error there is when regressing the log-transformed response times on speed. A positive residual response time of person p on item i, for example, indicates that the response on this particular item is slower than is expected given the speed of person pand the time intensity of item i. As a result, this residual term makes differences in response time between people with the same speed possible. If this parameter was not included everybody with the same speed should respond to the same item equally fast. Something that is not plausible in real life administration.

Based on the residual response time, the distinction between fast and slow responses using a double median split is made in the following way:

Zpi=

(1 i f pi≥ 0

0 i f pi< 0

(5) where Zpiis the class and piis the residual response time.

Although using a double median split forms a solution for the assumptions of the median split that might not hold in practice, it does not remove the arbitrary nature of a median split. A double median split results in person and item specific decisions regarding class. In this way it is not assumed anymore that every person is the same regarding speed or that for each item the fast and slow category are used equally much. However, what is not accomplished is removing the arbitrary cut-off score. One still makes a deterministic distinction between fast and slow responses. As a consequence also a double median split is not the most optimal solution for distinguishing fast from slow responses.

The Response Time Based Predictive Class Model Therefore another method of distinguishing fast from slow responses is considered. The Response Time Based Predictive Class Model is different from the previous discussed ones in the sense of introducing probabilities. The

split between fast and slow responses is not deterministic anymore, what was the case with the median splits, but becomes probabilistic. As the name of the model suggests, the response times are taken to predict if a response is either fast or slow. Using multiple parameters that indicate how likely it is for a response to end up in the fast or the slow class, every single response of every single person is assigned to a class. Note that it thus builds on the double median split. Moreover, a distinguishing factor of the Response Time Based Predictive Class Model is the introduction of the parameters ζ1i and ζ0i. Using the following formula these

parameters are used to assess the probability of person p to respond to item i slowly (Pr(Zpi= 1)):

Pr(Zpi= 1) =

exp(ζ1i(pi−ζ0i))

1+ exp(ζ1i(pi−ζ0i))

(6) where ζ1i denotes the faster-slower slope parameter which

indicates how the probability of ending up in the fast class versus the slow class depends on the residual response time, pi is the residual response time and ζ0i the intercept which

indicates where the distinction between the fast and slow class is made.

In other words ζ1i indicates per item how, for example, a

positive residual response time relates to whether a response is classified as fast or slow. If ζ1i is high this resembles

that a slower response makes it a lot more likely for one to end up in the slow class. This parameter is constrained to be non-negative to ensure that the parameters in the slower measurement model (αs and βis) correspond to the

measurement properties of the slower responses (Molenaar et al., 2016). ζ1i indicates how well the fast and the slow

class are separated and therefore forms an indication of how good the residual response time is as a predictor of class membership. In other words, ζ1i indicates how large the

’grey’ area of classification is and thus how certain one can say that a response is indeed fast or slow. Note that the value of ζ1i is item specific so a separate value of how certain the

classification of a response is, is estimated per item.

In addition, the parameter ζ0iforms the intercept. It states

where the distinction between the fast and the slow class is made. Hereby it indirectly indicates if it is more likely for a response to end up in the slow class compared to the fast class. ζ0iwill split the distribution of residual response

times. It will do this at the point where left of the split the probability of a fast response is above .5. Likewise, the probability of a slow response on the right side of the split is above .5. ζ0iis item specific, which means that for every item

a separate value for ζ0iis estimated. Therefore, it is possible

for a similar residual response time, say pi = 0.25, to be

classified as fast on item 1, but to be classified as slow on item 2.

Now the parameters are clear, it becomes possible to compare this method of classification with the previous

(4)

discussed median split. To classify a response as either fast or slow, in the Response Time Based Predictive Class Model the parameters ζ1iand ζ01are added. The residual response

time (pi) is thus not the only quantity anymore on which

the distinction is based. When one compares the way of incorporating the residual response time in the determination of class membership in the Response Time Based Predictive Class Model, it stands out that the influence of residual response time is now indirect. Where responses that were slower than was expected given the time intensity of the item and the speed of a person were considered as slow responses using a double median split, there are now multiple parameters to make the distinction between fast and slow responses.

However, if the values of the newly added parameters are constrained to ζ1i = +∞ and ζ0i = 0 the classification of

the Response Time Based Predictive Class Model becomes identical to a double median split. By using ζ0i = 0

the probability of a negative residual response time to be classified as a fast response will be higher than the probability to be classified as a slow response. For a positive residual response time the opposite is true. In addition, if ζ1i

is specified as +∞ the separation between the two classes becomes perfect. In other words, there is no ’grey’ area where the classification of a response is not certain. If the residual response time of person p on item i is below zero it will be classified as a slow response and the other way around.

As was argued at the beginning of the paper, this double median split is not the best way of classifying responses as either fast or slow. Therefore, by estimating the value of ζ1i

and ζ0i, instead of restricting them, extra information about

the classification is gained. Hence, the Response Time Based Predictive Class Model is an improvement of the double median split. It will make more accurate decisions regarding class membership and it will show the degree of uncertainty with which this distinction is made.

Although the Response Time Based Predictive Class Model resolves the problems that resulted from using a median split, this model uses one distribution for both fast and slow responses. Although this is a good starting point, using one distribution for both fast and slow responses does not seem plausible in real life. It is not likely that items have the same time intensity in the fast and in the slow class. Therefore in the next section a model is discussed in which two separate models are used for the classes: the Mixture Model of fast and slow responses.

The Mixture Model

In this Mixture Model of fast and slow responses, not a single distribution is used to distinguish between the fast and slow class, as was the case in the Response Time Based Predictive Class Model, but two separate distributions

are introduced. For every item the distribution of log-transformed response times is composed of two normal distributions with time intensity of the item (ξi) (in either the

fast or the slow class) minus the speed of person p (τp) as the

mean. The variance equals the variance of residual response time (σ2

i). By using two separate distributions with a possible

overlap, one can specify class specific characteristics and incorporate these in the distributions. This is not possible in the Response Time Based Predictive Class Model where a single distribution is split in two.

By using two separate distributions, very fast and very slow responses will always be classified as such, but responses that are slightly in the middle will sometimes end up in the slow class and sometimes in the fast class. This ’grey’ area, the overlapping part of the distribution, forms the uncertainty of the response classification and is determined by the difference between the time intensities in the two classes. The bigger the overlapping area, the less certain one can say that a response is either fast or slow. Note that this overlapping area has the same function as the ζ1i parameter

in the Response Time Based Predictive Class Model. However, not only the response time itself is taken into account. In a Mixture Model it is assumed that an item specific latent class variable, Zpi, underlies the response,

Xpi, and the response time, Tpi, of respondent p on item i

(Molenaar et al., in press). And more importantly that the latent state on item i may depend on the latent state on item i −1 (Molenaar et al., in press). In this model whether you are in the fast class or the slow class influences the chances of you being in either the fast class or the slow class on the next item. Therefore, the Mixture Model could account for people solely using the fast strategy or solely the slow strategy in contrary to the median split method.

What the Mixture Model does is classifying a single response of a single person based on the preceding response. In this way the response on item i will be dependent of the response on item i−1, but independent of the other preceding responses (i − 2, i − 3, etc.). This Markov property is used to decide how likely it is for this person to respond to the next item in a certain way.

To start, the distribution of response times given the class (Zpi), the speed of a person (τp), the time intensity of an

item (ξi) and the residual variance of that item (σ2_i) is the

following:

f(Tpi| Zpi, τp, ξi, σ2i)= lnN(Tpi; (ξis−τp)Zpi(ξi f−τp)1−Zpi, σ2i)

(7) here you see that log-transformed response times are assumed to be normally distributed with a mean equal to the time intensity of item i (ξi) minus the speed of person p

(τp) and a variance equal to the residual variance of that item

(σ2

i). Note that there is a different time intensity parameter

for the two classes.

(5)

derived by three probabilities. First of all, the probability of starting in the slow class is captured in π1. This value is

estimated by taking the proportion of responses on the first item that belonged to the slow class and dividing it by the total number of responses n.

The parameter π01is the transition probability of the fast

class to the slow class. It expresses how likely it is for the following response to be classified as a slow response given that the current response belonged to the fast class. Finally, π11is the probability of staying in the slow class. It gives one

information about the probability of someone who responded to item i − 1 in class Zpi = 1 to respond to item i in the

same class. As stated above, the Mixture Model can account for people either using solely the fast strategy or the slow strategy. It does so by using π01and π11.

To come up with the class vector z, the Mixture Model uses the probability of the response belonging to one of the classes to find the most likely class sequence. It keeps track of the previous class memberships whereby not only the responses but also the classes become dependent. The Mixture Model thus classifies responses as either fast or slow using two response time distributions that overlap. As is the case with the Response Time Based Predictive Class Model, the Mixture Model is probabilistic. The classes are not separated perfectly but include a ’grey’ area where responses sometimes are classified as fast and sometimes as slow. In contrary to the Response Time Based Predictive Class Model, the Mixture Model does not use ζ1iand ζ0i, but

introduces the transition probabilities π01and π11and defines

the ’grey’ area by the difference between the time intensity of an item in the fast class and the time intensity of the same item in the slow class.

How to Model the Difference between Fast and Slow Responses

The aim of this study is to find how well the Mixture Model can be used to detect the different classes and which conditions contribute to this. To investigate this, a simulation study is done to learn more about parameter recovery. At the end of the paper a research proposal is added regarding whether fast and slow processes can be identified in real psychological data. This is done with respect to multiplication items.

Priors and the Density Function

In the following section the prior distributions of the estimated parameters are discussed and the density function is presented.

Priors

First the priors for the item parameters are discussed. To start the prior distributions of the discrimination parameters

are the following:

f(αf) ∝ 1 αf (8) f(αs) ∝ 1 αs (9) These priors are chosen to make sure the discrimination parameters are positive. This is necessary because it is assumed that test items are made in this way. When the discrimination of an item would be negative, this would mean that a respondent with a lower theta value has a higher probability of a correct response. This is not what is intended. These improper priors are typically chosen for non-negative parameters. Furthermore, note that the improper priors will result in proper posteriors anyway because of the large sample size (N = 1000) and the large number of items.

Moreover, the difficulty parameters and the time intensities are assumed to come for a multivariate normal distribution: (βi f, βis, ξi f, lnξi∆) ∼ MV N(µI, ΣI) (10) with µI∼ MV N(µ,Σ) (11) where µ = [ 0 0 0 0 ] (12) and Σ =               100 0 100 0 0 100 0 0 0 100               (13) and with ΣI∼ W−1(6,Ψ) (14) where Ψ =               1 0 1 0 0 1 0 0 0 1               (15)

The multivariate normal prior is chosen because both the difficulty parameters and the time intensities are assumed to be normally distributed and the values are possibly correlated. In addition, the priors for the mean vectors and covariance matrices are chosen for mathematical convenience of conditional conjugacy.

Furthermore, the prior of the person parameters is specified as the following:

(θp, τp) ∼ MV N(µP, ΣP) (16)

with

(6)

and ΣP= " 1 ρθτσθστ σ2τ # (18) As was the case with the difficulty parameters and the time intensities, θ and τ are assumed to come from a multivariate normal distribution because they are believed to be normally distributed and possibly correlated.

The means of both θ and τ are set to 0 to scale the latent factors. In addition, the variance of θ is set to 1 to identify the model. The correlation between θ and τ and the variance of τ are freely estimated using the following prior distributions:

ρθτ∼ U(−1, 1) (19)

A uniform distribution from -1 to 1 is chosen to make sure the value of the correlation falls between these two values. This is done to ensure the correlation is bound between -1 and 1.

In addition, the prior distribution for the variance of τ is the following:

f(σ2_τ) ∝ 1 σ2

τ

(20) This prior is chosen to ensure that the variance of τ is non-negative and is justified by the same reasons as stated in the paragraph about the chosen priors of the discrimination parameters.

Next to the item and person parameters, the probability the start in the slow class and the transition probabilities are assumed to come from a beta distribution:

π1∼ Beta(1, 1) (21)

π01 ∼ Beta(1, 1) (22)

π11 ∼ Beta(1, 1) (23)

These priors are chosen to make sure the sampled values fall between 0 and 1. This is done because a probability is bound between these two values.

Finally, the prior for class membership is the following: f(z | π1, π01, π11)= 1 X Z1=0 1 X Z2=0 · · · 1 X Zk=0 (πZ1 1 (1 − π1) 1−Z1₎ n Y i=2 (π11Zi−1+ π01(1 − Zi−1))Zi (1 − (π11Zi−1+ π01(1 − Zi−1)))1−Zi (24)

where z is the vector of class memberships

(z = [Z1, Z2, ..., Zk]), π1 denotes the probability of a

random person to start in the slow class, Z1 is the first class

and the sum is taken over all possible class combinations of the k classes. Furthermore, π11is the probability of a person

to stay in the slow class, Zi−1 is the class of the item before

item i, π01 is the probability of a person to switch form the

fast to the slow class and the product is taken over all n items but the first.

Density function

The density function of the data is the following:

d(X , T | Z)= N Y p=1 n Y i=1 f(Xpi, Tpi| Zpi, θp, τp, α, βi, ξi, ξi_∆, σ2i) (25) X and T are assumed to be conditionally independent given Zpi, person parameters and item parameters. The distribution

of Xpiis in equation 1, the distribution of Tpiis in equation

7.

Simulation Study

To investigate parameter recovery the Mixture Model is applied to several simulated data sets. Six different conditions are considered where in each condition 1000 respondents are sampled and 40 items are used. First a baseline model with specified item and person parameters is put forward, in which the log of ξ_∆ is not allowed to covary with βf, βs and ξf and also the correlation between

θ and τ is assumed to be zero. Then a condition is considered where the difference between the discrimination parameters in the two classes is smaller than in the baseline model. This condition is followed by a condition where the difference between βf and βs is smaller than in the baseline model, a

condition where the probability of starting in the slow class is larger than the probability of starting in the fast class, a condition where the mean of log-transformed ξ_∆ is halved compared to the specified value in the baseline model and finally, a condition where θ and τ are allowed to covary. In each condition 50 replications are done, which resulted in a total of 300 simulated data sets. In addition, the model was estimated using a Gibbs Sampler with 10000 iterations, out of which the first half was removed as burn-in to ensure that the starting values have less influence, and every 10th iteration was used to remove autocorrelation. The true values of αf, αs, βi f, βis, ξi f, ξi_∆, z, x, ln t, σ2_i, θ and τ as well

as the estimated values are saved to be able to assess the precision and the accuracy of the estimators. Several tables are presented with the mean difference between the estimated values and the true values of the model parameters and their variances. In addition, the mean proportion of correctly classified responses is computed per condition. All this is done to investigate which factors influence the parameter recovery.

Condition I: No correlation between the log ofξ_∆and the parametersβf,βsandξfand no correlation betweenθ and

τ

First the item parameters are discussed. Following Molenaar et al. (in press) the residual variance (σ2

i) is chosen

to equal .2. Then class specific values for the discrimination parameters are specified. For αf the value of 1 is chosen

(7)

and for αs the value of 2. These values indicate that slow

responses are assumed to discriminate better. Although in the literature conflicting conclusions are drawn about which class is more discriminative (e.g Molenaar et al., 2016, in press; Partchev & De Boeck, 2012), multiplication items are assumed to be more discriminative on the slow side of the spectrum. This is believed because the slow strategies are assumed to be less optimal. As a result, these slow strategies will all result in a lower probability of a correct response. Furthermore, if a response is slow and correct, this gives one more information about the ability of that person and the strategy he used than fast correct responses do.

After this the values for βi f, βis, ξi f and the

log-transformed difference between ξi f and ξis(the log of ξi∆) are

sampled from a multivariate normal distribution with mean vector µI and covariance matrixΣI. The mean vector µIis

specified as the following:

µI = [ −1 1 2.5 log(

√

.2) ] (26)

where the means of the difficulty parameters are chosen as such to emphasize that items have a higher difficulty in the slow class. The values are spread around zero to avoid extreme difficulty values. This choice is a result of our emphasis on multiplication items. Because the goal of multiplication tests is rarely to detect only very high or very low thetas, it is not likely that a lot of extremely difficult or extremely easy items will exist. Furthermore, the mean of time intensity in the fast class is taken from Molenaar et al. (in press). In addition, the mean of the difference between the time intensities in the two classes is derived from the residual variance, which is stated above. Because it is assumed that items that are more time intensive take more time to respond to, they will more likely end up in the slow class. Therefore the log-transformed difference is used to ensure that the difference between the time intensities is non-negative.

Moreover, the covariance matrixΣIis the following:

ΣI=               .16 .096 .16 .032 .024 .04 0 0 0 .01               (27)

where the variance of the difficulty parameters is chosen as such for the same reasons as their means were chosen: with respect to multiplication items it is not likely that there will be items with extreme difficulty values. As a result the variance is assumed to be small. Moreover, the variances of βf and βsare the same because there is no reason to believe

they should differ in any way. The correlation between βf

and βsis set to .6 to emphasize that they represent the same

parameter except in a different class. It is assumed that the difficulty of an item in the fast class resembles a similar difficulty in the slow class, but that these differ as a result of, for example, solution strategy. Furthermore, the variance

of ξf equals .04 because it is assumed that time intensities

do not differ that much. Especially when keeping real life applications in mind, it is unlikely for the classes to be clearly separated in real psychological data. As a result it is unlikely for the distributions to shift that much. The same is the case for the variance of log-transformed ξ_∆. It is assumed that the size of the overlapping area of the class distributions does not change that much across items and across persons. The value .01 is chosen to emphasize that multiplication items are quite similar. Accordingly, it is expected that the separation of the classes is quite similar in every response. Furthermore, the correlation between βf and ξf is set to

.4. This value is chosen to indicate that more difficult items will generally be more time intensive. The harder an item, the more time it takes to solve. This correlation is lower than the correlation between the two difficulties because equal parameters in different classes (in the case of the difficulties) are assumed to be higher correlated than different parameters in the same class (in the case of βf

and ξf). However, different parameters in the same class are

still assumed to be moderately correlated. Moreover, also different item parameters in different classes, such as βsand

ξf, are assumed to correlate, but even less than different

parameters in the same class. The correlation between βs

and ξf is set to .3. This value is chosen because it is likely

that even though both the parameters and the classes are different, the item which the parameters explain is still the same. Finally, the correlations of log-transformed ξ_∆ with the other item parameters are assumed to be zero. There is no clear reason to believe that the size of the overlapping area of the class distributions depends on any other item parameter. Therefore, a zero correlation is used as a baseline. In addition to the item parameters, person parameters are specified. The ability of person p (θp) and the speed of the

same person (τp) are sampled from a multivariate normal

distribution with mean µP and covariance matrix ΣP. The

mean vector is the following:

µP = [ 0 0 ] (28)

where the means of zero are chosen to scale the factors. Furthermore the covariance matrix is the following:

ΣP =

" 1

0 .01

#

(29) where the variance of θ is set to 1 to identify the model. Additionally, σθτ is assumed to be zero. Although this

might not seem plausible in real life administration, a zero correlation is used as a baseline. Finally, the variance of τ is specified as .01 to make sure the proportion of explained variance due to speed is lower than the proportion explained variance due to measurement error. Recall that the residual variance is set to .2. This choice regarding the difference between the proportion of explained variance is

(8)

made because it is assumed that people do not differ that much in speed. Although some people will work more slowly than others, the testing situation in which multiplication items are presented, is assumed to bound the speed. As a result, the speed is thought to be roughly the same for every respondent. The value of .01 is justified in the literature (e.g. Molenaar et al., in press), where the variance of τ appears to be very small.

Now both the item parameters and the person parameters are specified, the data sets can be simulated. First of all, the probability of starting in the slow class (π1) equals .5. This

probability, in which both classes are equally likely, is used as a baseline. Then the class per person per item is sampled using two transition probabilities. If the starting class (Zp1)

equals zero (i.e. the fast class) the probability of switching to the slow class (π01), which equals .2, determines what Zp2

will be. This transition probability is chosen to emphasize that the fast class is assumed to be a result of a more optimal strategy. If a person responds to an item fast, it is believed to be more likely that this person mastered the concept and thus that he will use the fast, optimal strategy again. If Zp1equals

one (i.e. the slow class) the probability of staying in that class (π11), which equals .7, will determine what Zp2will be. This

probability is chosen because it is not likely for a person to switch strategies easily. However, π11 is specified as lower

than the probability of staying in the fast class to indicate that it is more likely for a person to keep using an optimal strategy than to keep using the non-optimal strategy. In other words, it is more likely for someone to master the concept during administration than to forget how to correctly solve a multiplication item.

After sampling the classes per person per item, the actual responses are sampled using a two-parameter logistic model. For this the class specific discrimination parameters (αf and

αs), the class specific difficulties (βi fand βis) and θpare used.

Finally, to come up with a full data set, log-transformed response times are sampled from a normal model with a mean equal to the time intensity of the item minus the speed of the person (ξi - τp) and a variance equal to the residual

variance (σ2_i). In computing the response times, the class specific time intensity (ξf in the fast class and ξf+ ξ∆in the

slow class) is used.

In the following paragraphs the other five conditions are discussed. Hence, only the parameters are explained of which the values are not equal to the values in the baseline model.

Condition II: A smaller difference between αf andαs

Since the literature provides several conflicting conclusions about the values and the importance of differing discrimination parameters in the two classes (e.g. Molenaar et al., 2016, in press; Partchev & De Boeck, 2012), a model is added in which these values are changed. Whereas in the

baseline model the value of 1 was chosen for αf and the

value of 2 for αs, in this second condition the difference

is made a bit smaller. This is done to determine how small the difference between the discrimination parameters can be while the classes can still be detected. Therefore, in this second condition, the value of αf is kept the same and

αs is lowered to 1.5. Still the discrimination in the slow

class is assumed to be larger than the discrimination in the fast class for reasons stated above. Although a large difference between parameter values in the two classes makes it easier to separate the two, lowering the difference between the discrimination parameters is not expected to influence the classification that much. First of all, this is thought since there are other, more important factors, such as the difference between time intensities, that also determine the classification success. Besides, a class specific discrimination is used that does not vary across items. Therefore, the influence of a change in the value of the discrimination parameters on the proportion of correct classification is expected to be very small. However, more importantly, this condition is added to assess the accuracy and precision of the discrimination estimates.

Condition III: A smaller difference between βfandβs

Besides a smaller difference between the discrimination parameters, the difference between the difficulty parameters could be an influencing factor in classification accuracy. Therefore in this third condition the values for the difficulty parameters are changed. Now the mean of βf is set

to −.5 and the mean of βsis set to .5. With these values and

the same variances, σ2 βf = σ

2

βs = .16 respectively, extreme

difficulty values will still be rare. As a result the covariance matrix ΣI stays the same, but mean vector µI changes. To

make it less confusing the adjusted mean vector is named µ0

I:

µ0

I= [ −.5 .5 2.5 log(

√

.2) ] (30)

where both difficulty values are closer to zero and the difference between the two is halved to assess the effect of such a change on the proportion of correct classifications.

In this condition, the proportion of correctly classified responses is expected to change. This expectation results from the find of Partchev and De Boeck (2012) that the difference between the fast and the slow class is reflected in the item difficulties. If the difference between the values of β is lowered, it is expected that the proportion of correct classification decreases. Because specific difficulty values are formulated for each item in each class, the change in the difference between the two difficulties is, in contrary to the discrimination change, expected to be seen in the proportion of correctly classified responses.

(9)

Condition IV: A larger probability to start in the slow class

In the fourth condition changes are made to the part of the simulation were class membership is determined. In the baseline model it was assumed that it was equally likely for a random person p to start in either the fast or the slow class. In real life administration, however, this might not be the case. Apart from the strategy someone uses, it is more likely for someone to start a test slowly. This is the result of factors such as someone not knowing what kind of items will be presented and nervousness. Therefore the specification of π1

as .5 might not be an optimal choice. For this reason the value of π1is changed to .7. As a result, more respondents will start

in the slow class in this condition as opposed to the baseline model. Since the transition probabilities π01 and π10 (1

-π11) differ, the response vector is thought to change, which

in turn influences the classification. However, although the vector of class memberships changes by increasing the probability to start in the slow class, the Markov property of the Mixture Model is expected to make this change of little influence. Because only the preceding response (i − 1) predicts the response on item i, the influence of the starting class is small. Nonetheless, this condition is included to assess the accuracy and the precision of the estimates. Keeping real life applications in mind, it is important to assess these factors to be able to appropriately fit the model to real psychological data.

Condition V: A smaller difference between ξf andξs

As stated above, the models for distinguishing fast from slow responses are quite new. The field is pretty young and it still needs to develop. As a result, not much is known about how well classes are separated in the real world. To investigate how far the classes need to be separated to be able to detect them, in this fifth condition the mean of log-transformed ξ_∆ is lowered. The mean of the log of ξ_∆ is set to log(1₂

√

.2). Note that the residual standard deviation (√.2), from which the logarithm was used as the mean in the baseline model, is now halved.

By changing the mean of log-transformed ξ_∆the adjusted mean vector µ” Ibecomes: µ” I= [ −1.5 1.5 2.5 log( 1 2 √ .2) ] (31)

Since the mean of the log of ξ_∆ determines the size of the overlap of the two class distributions, the proportion of correctly classified responses is expected to change. By decreasing the difference between the fast and the slow time intensity, the separation of the classes becomes less clear. As a result, a lower proportion of correct classifications is expected.

Condition VI: A positive correlation betweenθ and τ In this sixth condition a change in the person parameters is proposed. More specifically, a positive correlation between θ and τ is added relative to the baseline model. The correlation between θ and τ is specified as .7. This value is chosen to indicate that people that are better in solving multiplication items are assumed to work more quickly. If someone has mastered the solving of multiplication items, this person does not need as much time to respond to the items as someone who did not master the concept yet.

As a result the covariance matrix of the person parameters changes. It does so in the following way:

ΣP= " 1 .007 .01 # (32) This condition is included to assess if adding a correlation between θ and τ improves the estimation of both variables. This is believed to be the case as a result of the estimations ’borrowing strength’ from the other estimation. By adding a strong correlation between both person parameters, the estimation of θ is influenced by the estimation of τ and the other way around. It is as though the estimation of τ pulls the estimation of θ to the true value and the opposite is true for the estimation of τ.

This condition is especially included to assess the accuracy and precision of the estimates keeping in mind future application. As stated above, the assumption of a zero correlation between ability and speed is not likely to hold in practice. Therefore, this condition was used as a starting point for future application to real psychological data. As a result, it is not expected that the proportion of correct classifications will alter from the adding of a correlation between θ and τ.

In the following section, the results of the simulation study are presented and the expectations formulated in the previous paragraphs are evaluated.

Results

After the simulation study is run, the estimates of the item parameters are compared to the true values. Moreover, classification accuracy is evaluated by comparing the true class vector with the estimated class vector. In the following paragraphs the results are discussed.

Item Parameters

In this section the recovery of the item parameters is discussed. In Table 1 the mean differences between the estimated item parameters and the true values and the variances are summarized.

First of all, the discrimination estimates are evaluated. As one can see, in all conditions but one the fast discrimination is less accurate than the slow counterpart. Also the variances

(10)

Table 1

Mean difference between estimated values and true values of item parameters and mean of estimated variance in each condition between parentheses.

Condition αf αs βf βs ξf log(ξ∆) I -.005 (.001) .024 (.009) .328 (.154) .278 (.153) .163 (.039) .046 (.002) II .001 (.001) .006 (.005) .277 (.177) .240 (.221) .158 (.041) .039 (.005) III .010 (.001) .023 (.008) .390 (.165) .364 (.149) .130 (.043) .037 (.003) IV .016 (.001) .029 (.008) .267 (.159) .332 (.147) .150 (.040) .033 (.003) V -.009 (.002) -.003 (.018) .286 (.156) .359 (.152) .150 (.042) .019 (.001) VII .016 (.001) .018 (.009) .316 (.166) .337 (.223) .134 (.041) .037 (.005)

Note. The values in each condition are obtained by averaging the sampled values across iterations and then averaging the mean difference between the estimated value and the true value across replications. The variance is derived by averaging the variance of the estimated mean differences across replications. Moreover for computing the mean difference of the difficulty parameters, the absolute value is used.

of the slow discrimination estimates in each condition are higher than the fast ones. In addition, the variances of the fast discrimination estimates are close to the true value, which was specified as zero. For the slow discrimination estimates this variance is a little more off. As a result, the discrimination estimates are not only less accurate in the slow class, but also less precise. Another thing that stands out is that in the second condition, the condition where the slow discrimination was set to 1.5 instead of 2, the estimates in both the fast and the slow class are less biased than in the baseline model. In addition, the variance of the slow class estimate in this second condition is the lowest of all conditions. As a result, the estimate of the slow discrimination is more precise in the second condition too.

Furthermore, when comparing the estimated difficulty parameters across conditions, the accuracy of the fast and the slow difficulties are quite similar. Where in some conditions the fast estimates are less accurate, in other conditions the opposite is true. In the second condition, the one in which the slow discrimination was altered, the bias of the difficulty estimates decreases. Moreover, in the third condition, the condition in which the means of both βf and βsare lowered,

the bias of the estimates increases compared to the baseline model.

Besides the discrimination and the difficulty parameters, the time intensities in both classes are reviewed. As one can see, the estimates of both ξf and the log of ξ∆ are more

accurate than the estimates of the difficulty parameters. Also the variances are more stable across conditions and are less deviant from the specified ones. In the fifth condition, the condition in which the separation between the two classes was made less clear, the estimates of both time intensities are more accurate compared to the baseline model. The precision is almost the same in the two conditions.

Classification accuracy

One of the most important questions in this study was if the classification of responses would be correctly determined by the model. Therefore, the estimated starting and transition probabilities were compared with the true values and, based on both mean and mode, the proportion of correct classifications per condition was determined. Note only one proportion of correctly classified responses is reported. This is done because the rounded values of the proportion correct classifications based on mean and the proportion correct classifications based on mode, were equal in each condition. Table 2

Comparison of estimated values and true values of starting and transition probabilities and proportion of correct classification per condition.

Condition π1 π01 π11 πZˆ=Ztrue I -.014 (.003) .002 (.000) -.001 (.000) .780 II -.009 (.005) .007 (.003) -.014 (.007) .778 III .017 (.005) .000 (.000) -.002 (.000) .750 IV .188 (.002) -.001 (.000) -.001 (.000) .783 V .000 (.005) .000 (.000) .000 (.000) .722 VI .000 (.004) .009 (.003) -.014 (.006) .776

Note. The values in each condition are obtained by averaging the sampled values across iterations and then averaging the mean difference between the estimated value and the true value across replications. The proportion of correctly classified responses is determined by computing the mean and mode of the class per response across iterations and comparing this class estimate to the true value.

First of all, across conditions, the mean differences between the estimated and the true starting probability are compared. Although the mean differences between the true and the estimated values are small, two things stand out. In

(11)

the first place, the accuracy in the fourth condition, the condition in which the starting probability was changed to .7, is a lot lower than the accuracy in the baseline condition. Besides, in the fifth and sixth condition, the starting probability is estimated almost perfectly.

Furthermore, the estimated values as well as the variances of the transition probabilities are similar to their true values. This makes the estimates both accurate and precise. The only slight deviances are found in the second and sixth condition. However, because the deviations in both conditions are so small, the differences are negligible.

Besides the starting and transition probabilities, in each condition the mean proportion of correctly classified responses is computed. In the baseline model 78% of the responses is classified correctly. This means that the model specifies 78 of the 100 responses as being indeed fast or slow. As Table 2 shows, some conditions perform below this .780 and some perform above this baseline value. The models in condition II, III, V and VI classify less responses correctly than the baseline model. Especially the third and the fifth model stand out because of their low proportion of correctly classified responses. In these conditions the difference between the difficulties in the two classes (condition III) and the difference between the time intensities in the two classes (condition V) were altered. A condition that performs slightly above the baseline model regarding classification success, is the fourth condition.

In line with the expectation, decreasing the difference between the discrimination parameters only has a little negative effect on the proportion of correctly classified responses. Besides, the altering of the difficulty parameters does have a larger negative effect on the proportion of correct classifications than the discrimination change. This is in line with the expectations too. Furthermore, in the fourth condition, the proportion of correctly classified responses is the highest. Although a slight change was expected, it was not expected that this condition would outperform all other conditions regarding classification success. The most striking decrease in the proportion of correctly classified responses is found in the fifth condition. Also this was in line with the expectations. Finally, in the sixth condition the proportion of correct classifications somewhat decreased. Although this was not quite what was expected, the decrease in classification success is relatively small.

Discussion

In this paper the Mixture Model was successfully applied to several simulated data sets. In six conditions the accuracy and precision of the estimates of the item parameters was assessed and the proportion of correctly classified responses was determined. The results show that there a couple of important factors to keep in mind in future application. Such factors include the difference between the

difficulty parameters in the two classes and the difference between the time intensities.

As was found, the estimates of the discrimination parameters in both classes were more accurate as well as more precise in the condition where the value of αs was

lowered. Because we kept the value of αf the same as

in the baseline model, the change in bias could only be due to the lowering of αs or the decrease of the difference

between αf and αs. To further investigate the reason for the

accuracy difference between the discrimination estimations in the second condition and the rest of the conditions, another condition needs to be considered: one in which the values of both αf and αsare decreased but the difference between the

two stays the same as the difference in the baseline model. Furthermore, it would be useful to assess what happens when the values of the discrimination parameters are switched around. It could be that the discrimination of fast responses is higher than the discrimination of slow responses instead of the proposed lower fast discrimination. This is believed because there are more slow strategies thinkable that differ in their correctness. Several strategies, such as solving the item multiple times before responding and having trouble reading the question, all result in slow responses. However, some slow strategies will result in a higher probability of a correct response than others. In contrary, fast responses are assumed to represent more optimal strategies and thus all result in a higher probability of a correct response. Therefore, slow response might be less discriminative than fast responses. This is worth investigating before fitting the model to real psychological data.

Another interesting find is the decrease in proportion of correctly classified responses in the third condition, in which the difference between the difficulty parameters was lowered. This seems to be evidence for the statement of Partchev and De Boeck (2012) that the difference between the fast and the slow class is reflected in the item difficulties. As a result, if the difference between the difficulty values in the two classes becomes less clear, the separation between the classes becomes less clear too. However, the decrease in proportion of correct classifications could also be due to the decrease of the accuracy of the estimates. As a result of the larger bias in both the fast difficulty and the slow difficulty estimates in this third condition, the proportion of correctly classified responses could have decreased.

Furthermore, a suggestion for further investigation is the effect of the Markov property on the proportion of correct classifications. In all conditions, two differing transition probabilities were used. This resulted in a dependency between the class of the response on i − 1 and the class of the response on i. It would be important to assess is this dependency effects the classification success because it is not known if the classes are dependent in real applications. Therefore, it would be suggested that another condition

(12)

is included in which the transition probabilities are both specified as .5.

Finally, another suggestion for future research is the inclusion of a correlation between ξf and ξ∆. If the time

intensity of item i in the fast class is relatively high, it could be that the difference between the time intensities is smaller. The reason for this thought is that the time intensity in the slow class, which will be higher than the time intensity in the fast class for reasons stated above, is believed to be upper bounded. Because there is a certain limit to how time intensive an item can be, the larger the time intensity in the fast class, the smaller the difference between the time intensities might be. At some point the difference between the time intensities might become smaller when the time intensity in the fast class increases. Therefore it would be informative to test a condition in which ξf and

log-transformed ξ_∆are negatively correlated.

References

Bolsinova, M. (2016). Balancing simple and complex models: Contributions to item response theory in educational measurement.(Unpublished doctoral thesis)

Goldhammer, F. (2015). Measuring ability, speed, or both? challenges, psychometric solutions, and what can be gained from experimental control. Measurement: Interdisciplinary Research and Perspectives, 13(3-4), 133-164.

Goldhammer, F., & Klein Entink, R. (2011). Speed of reasoning and its relation to reasoning ability. Intelligence, 39, 108-119. Klein Entink, R., Kuhn, J., Hornke, L., & Fox, J. P. (2009).

Psychological methods, 14(1), 54-75.

Loeys, T., Rossel, Y., & Baten, K. (2011). A joint modeling approach for reaction time and accuracy in psycholinquistic experiments. Psychometrika, 76(3), 487-503.

Molenaar, D. (2015). The value of response times in item response modeling. Measurement: Interdisciplinary Research and Perspectives, 13(3), 177-181.

Molenaar, D., Bolsinova, M., Rozsa, S., & De Boeck, P. (2016). Modeling individual differences in responses and response times to the Hungarian WISC-IV block design test. Manuscript submitted for publication.

Molenaar, D., Oberski, D., Vermunt, J., & De Boeck, P. (in press). Hidden markov IRT models for responses and response times. Multivariate Behavioral Research.

Partchev, I., & De Boeck, P. (2012). Can fast and slow intelligence be differentiated? Intelligence, 40(1), 23-32.

Petscher, Y., Mitchell, A., & Foorman, B. (2014). Improving the reliability of student scores for speeded assessments: an illustration of conditional item response theory using a computer-adminstered measure of vocabulary. Reading and writing, 28(1), 31-56.

Scherer, R., Greiff, S., & Hautamäki, J. (2015). Exploring the relation between time on task and ability in complex problem solving. Intelligence, 48, 37-50.

Shiffrin, R. M., & Schneider, W. (1977). Controlled and automatic human information processing: II Perceptual learning, automatic attending and a general theory. Psychological Review(84), 127. Van der Linden, W. J. (2006). A lognormal model for response

times on test items. Journal of Educational and Behavioral Statistics, 181-204.

Van der Linden, W. J. (2007). A hierarchical framework for modeling speed and accuracy o test items. Psychometrika, 72(3), 287-308.

Van der Linden, W. J. (2008). Using response times for item selection in adaptive testing. Journal of Educational and Behavioral Statistics, 33(1), 5-20.

Van der Linden, W. J. (2009). Conceptual issues in response-time modeling. Journal of Educational Measurement, 46(3), 247-272.

Van der Linden, W. J., & Guo, F. (2008). Bayesian procedures for identifying aberrant response-time patterns in adaptive testing. Psychometrika, 73(3), 365,384.

Van der Maas, H. L., Molenaar, D., Maris, G., Kievit, R. A., & Borsboom, D. (2011). Cognitive psychology meets psychometric theory: on the relation between process models for decision making and latent variable models for individual differences. Psychological review, 118(2), 339.

Van der Maas, H. L. J., & Jansen, B. R. J. (2003). What response times tell of children’s behavior on the balance scale task. Experimental Child Psychology, 85, 141-177.

Visser, I., Raijmakers, M. E., & Molenaar, P. (2002). Fitting hidden markov models to psychological data. Scientific Programming, 10(3), 185-199.