• No results found

Tapping in and out of random: Investigating the behavioural and neural patterns of attention

N/A
N/A
Protected

Academic year: 2021

Share "Tapping in and out of random: Investigating the behavioural and neural patterns of attention"

Copied!
43
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Tapping in and out of random: Investigating the

behavioural and neural patterns of attention

Sven Wientjes; 12334189

MBCS research internship

Examiner: Birte Forstmann

Supervisor: Matthias Mittner

(2)

Abstract

Mind wandering is a ubiquitous phenomenon that takes place during many different tasks in many different forms. Typical studies of mind wandering rely on thought-probes to assess the focus of the participant: They are simply asked if they were on-task or mind wandering at that point in time. Analyses then take place on the trials preceding these probes, showing possible differences in performance and brain activity between moments of on-task attention and mind wandering. The current study develops a new classification model for mind wandering, the Probed Hidden Markov Model (PHMM). The PHMM applies the unsupervised Hidden Markov Model (HMM) clustering procedure, but introduces a bias during the fitting in the form of participants probe responses. Using simulations, we find that the PHMM improves the model fit over the HMM, specifically using an intermediate number of features (4-5) and a moderate effect size between on-task and mind wandering states (around 0.5 sd). The PHMM also provides a bias towards correct state labelling, meaning post-hoc interpretation of the meaning of the states is simplified, as the states are likely to follow the encoding of the probe responses. The PHMM was applied to real data containing both behavioural and neural measures, and identifies significant differences in the means between the on-task and mind wandering states. Functional connectivity seems to be an important driving factor for identifying these two states, while ROI activity typically associated with on-task and mind wandering states seems of little impact.

(3)

Introduction

The internal contents of the mind from moment to moment are introspectively experienced as highly variable. Although William James in 1890 already attempted to describe these as ’flights’ and ’the stream of consciousness’ (James, 1890), the scientific acceptance of the study of such purely internal experience is something that has only seen the light since the 1980’s (Mandler, 2002). As more advanced tools for neuroimaging were developed, more possibilities emerged to study thinking without direct relation to overt behaviour. This runs in parallel to the study of mind wandering. Mind wandering in those terms can be defined as the intrusions of thought that can happen during the execution of a well-defined external (or internal) task. By this definition, it is primarily an internal process. Studies estimate that between 25% and 50% of time awake is spent in a state of mind wandering (Kane et al., 2007; Yanko & Spalek, 2014), which is shown to influence and be influenced by many other variables such as mood and task difficulty (Killingsworth & Gilbert, 2010; Casner & Schooler, 2014). These lapses of directed attention can have detrimental consequences for cognitive tasks, importantly tasks such as the SAT exam (Mrazek et al., 2012), and for real-life behaviour such as aviation (Casner & Schooler, 2014) or driving (Cowley, 2013; Yanko & Spalek, 2014). However, mind wandering often arises involuntarily and an attempt to counteract its effects can only take place when the individual engaged in mind wandering realizes its instantiation (i.e. meta-awareness; Schooler et al., 2011) or an external system can detect the shift of focus and engage the individual back on task (e.g. Szafir & Mutlu, 2012).

As mind wandering is a largely subjective experience, it can be hard to find a good definition of it that allows it to be scientifically studied. Christoff et al. (2016) emphasize that mind wandering is a very dynamic process, with many different specific thoughts loosely chaining together, making every instance and moment of mind wandering for every individ-ual highly idiosyncratic. However, as Smallwood (2013a) emphasizes, there is evidence that there are ’domain general processes’ that are shared between these mind wandering states,

(4)

and even shared with certain task states. The brain as an organ can produce an incredible number of different mental states, all derived from its relatively small structure. This means experiences, including mind wandering, can emerge from simpler, more fundamental prop-erties. This is called a ’component-process account’ (Smallwood, 2013b; Smallwood et al., 2018). A component process view of mind wandering does not propose to explain the exact contents of mind wandering at any given moment, rather it proposes the state of mind wan-dering can be studied ’in general’. Capturing the wild variability of mind wanwan-dering states in a general class of internally generated thoughts, or as they are often called in the liter-ature, ’task unrelated’ or ’stimulus independent’ thoughts (Smallwood & Schooler, 2015), allows for a conceptualization of mind wandering as shown in Figure 1. In this figure, the line represents the relatedness of thoughts to the task the individual is currently engaged in. As the line moves up, more thoughts are internally generated and have a ’low coupling to perception’. This will commonly result in performance degeneration through diminished at-tentional capacity for the external task as captured by the background shading, with darker shades representing less directed attention.

Intuitively, as captured by Figure 1, mind wandering can then be seen as a process

Figure 1: An illustration of how moments of internally and externally oriented attention follow one another (shaded background) and guide the conscious stream of thought (line). Taken from Smallwood (2013a)

(5)

tending to one of two attractors: On-task or off-task thinking. Since it is possible to stray away from a task relying on internal representations and control, coupling to perception is only an approximate that works well for most experimental settings. If participants are asked to imagine specific scenarios, or for instance generate random numbers, the relevant ’task’ is decoupled from perception already. Mind wandering in that case, would be shifting internal attention to a different internal representation or task. Common mind wandering research uses thought probes to index participants state along this axis of task-relatedness (Smallwood et al., 2003). An experimental task ubiquitously employed in mind wandering research is the Sustained Attention to Response Task (SART; Robertson et al., 1997). In this task, participants have to press a button in response to every stimulus appearing on the screen in equally spaced temporal intervals (typically the numbers 1-9). One of these stimuli (often the number 3) is the ’target stimulus’, for which the button press should be omitted. This is a continuous performance GO-NOGO task, and is very simple, allowing participants to engage in deep mind wandering while still performing relatively well (only missing the target stimuli). Whereas errors in engaging cognitive tasks often predict better, more controlled, performance for a short while afterwards (a process called ’post-error slowing’, (Laming, 1979)), on the SART errors predict more errors and faster response times (RT) on subsequent trials (Cheyne, Carriere, & Smilek, 2009). According to Cheyne et al, the lack of post-error slowing conveys the idea that the SART does not engage participants enough for errors to activate a corrective process. This means the SART is a suitable task for indirectly probing the attentional state of the participant.

In their model of mind wandering, inspired by the SART, Cheyne, Solman, et al. (2009), propose three different possible ’levels’ of mind wandering. The first state is associ-ated with the increased variability in response timing. It is interpreted as transient lapses of attention to the dynamic aspects of the task (i.e. the presented stimuli). Commission errors on NOGO trials are associated with the second state of mind wandering: Loss of task set and decreased attention to the general external environment of the task. Ultimately, when

(6)

participants show errors of omission on GO trials, this is a state of inattention associated with full engagement of cognitive processing on an internal task, which can also lead to the participant initiating completely unrelated behaviour (e.g. starting a conversation, looking at their phone). This model shows that mind wandering can be theorized as more complex than a simple on-off task dichotomy.

Besides behavioural features, mind wandering also shows distinct recruitment of neural processes. Originally, in PET studies, brain networks were identified that were consis-tently suppressed in their activity during any cognitive task (Shulman et al., 1997). Roughly the same networks seem to be activated in fMRI during passive or resting conditions (Gre-icius et al., 2003). Later, Mason et al. (2007) found that these networks were more active during tasks when the participants reported they were mind wandering, theorizing that this network, dubbed the Default Mode Network (DMN), reflects a kind of ’psychological base-line’. Later research shows that different components of the DMN activate preferentially during different tasks, such as those requiring social inference or biographical memory , showing task-related activity within the DMN (Andrews-Hanna et al., 2014). Buckner & DiNicola (2019) interpret this as showing that the DMN is not so much a network funda-mentally anticorrelated with external task engagement, but more a network responsible for constructing and maintaining internal representations, with its own inner functional hetero-geneity dependent on the type of required internal representation. Mind wandering relies on such internal representations, but so do certain other, more task-related processes. Different mental contents during mind wandering might also rely on different functional parts of the DMN, meaning there might not be one neural ’profile’ or signature of mind wandering, but multiple, although constrained to identified DMN-related networks.

To extend the granularity of neural hypotheses of mind wandering, Mittner et al. (2016) pose a model that follows the general structure of the earlier discussed behavioural SART theory by Cheyne, Solman, et al. (2009). They propose that mind wandering does not simply transition from an external focus state to an internal focus state, but rather that

(7)

the brain first transitions through an ’off focus’ state. This off-focus state is associated with tonic high output from the Locus Coeruleus Norepinephrine system (LC-NE), causing task-unrelated networks (certain core hubs of the DMN) to activate and shift focus away from external performance. This is associated with a fundamental exploration-exploitation trade-off, just as can be seen in the model by Cheyne, Solman, et al. (2009), where the second level of mind wandering is interpreted as a loss of goal focus (equated with the off-focus state) and the third level with the acquisition of a task-unrelated goal focus (i.e. an active mind wandering state).

These extended models of mind wandering thus pose ’intermediate’ states of task inattention, out of which someone might shift attention back to the task, or switch to task-unrelated thinking. Mittner et al. (2016) summarize the behavioural and neural predictions for each state clearly, though the most identifying features of the additional intermediate state are the signal coming from the LC-NE, which is notoriously difficult to pick up with fMRI, but can also be assessed through an increased pupil diameter baseline. Behaviour should become more variable than when focused on the task, but more task-appropriate than when actively mind wandering. Without high-fidelity estimates of the pupil diameter and the LC-NE activity, it can thus be difficult to point out a clear boundary between the intermediate and active mind wandering states. Because of such measurement difficulties and the transient nature of the off-focus state, most research that aims to classify the attentional state of the participant without using probes thus reverts to two-state models of attentional focus. Using a support vector machine (SVM), Mittner et al. (2014) found that such a two-state model can achieve relatively high accuracy in classifying the extremes of responses to a Likert scale mind wandering probe (79.7%). Note that in this case, since intermediate probe responses are excluded, any trials in possible ’intermediate’ states of mind wandering are most likely excluded from the training set, and in a later stage most likely classified as less certain instances of the mind wandering class. Using a time delay multilayer perceptron, Melnychuk et al. (2020) achieve an accuracy of 76.84% in predicting mind wandering using respiratory

(8)

frequency data, however they dichotomize performance into on- and off task through a mean split of reaction time variability, providing a less direct supervisory signal than Mittner et al. (2014). One critical point according to Melnychuk et al. (2020) is, that performance of classification increased significantly when the data of previous trials is incorporated in the classifier, something Mittner et al. (2014) do not do.

Whereas the previous approaches to detecting mind wandering rely on a supervised signal coming from participants probes or known behavioural properties of mind wandering, it is also possible to fit statistics that model the data as a mixture of two processes without any supervision. Vandekerckhove et al. (2008) took this approach to remove contaminant trials from their data when fitting a diffusion decision model. Trials on which the participant does not pay attention will not follow the diffusion process of interest, thus will decrease the accuracy of parameter estimation. Detecting slight differences in parameterization can how-ever be difficult, and only 0.6% of all the trials in their data was classified as a contaminant, in contrast to the earlier mentioned estimated rates of mind wandering which are around 25% to 50%. Inducing some assumptions about participants off-task behaviour, Hawkins et al. (2019) set up a model that races between a standard diffusion decision process and a ’rhythmic’ process, where participants have the tendency to respond synchronized with the intrinsic rhythm of an external task. This assumption holds strongly for e.g. the SART, where rhythmic button presses are a core component of the task, and for many simple decision experiments conducted in the lab. Working backwards from their fitted mixture diffusion model, they could predict the probability that any trial was generated by either the decision-related or the rhythmic diffusion process ’winning’ the race. They found a correlation of -.44 between self-reported mind wandering and the chance that a trial was won by the decision process. However, neither mixture nor the race model take the temporal structure of mind wandering into account. A simple extension to mixture models is the Hidden Markov Model (HMM), which takes into account the classification of the previous trial in addition to the available features of the current trial.

(9)

Bastian & Sackur (2013) fit such a HMM to SART response time data and find that two states emerge: One labelled the ’on-task’ state, with RTs approximately following a Gaussian distribution, and a ’mind wandering’ state, with RTs following a heavily skewed exGaussian distribution. The probabilities to transition from on-task to mind wandering or vice versa on any trial lie between 0.1 and 0.2. They also find that mind wandering states occur more often towards the end of their experimental block. Their approach however, does not make use of reaction time variation in the fitting of the HMM. This means rhythmically generated responses during mind wandering might not be captured, as these do not lead to skewed RT data. Additionally, known brain features related to mind wandering are not included, such as trial-related activity in specific nodes of the DMN.

Zanesco et al. (2019) fit a HMM to attentional probes in the SART. They specifically focus on how likely one probe answer is to be followed by a specific next probe answer, on a scale of 1-6. The probes are presented approximately every 22.5 seconds, thus focusing on transitions of mind wandering on a larger scale then Bastian & Sackur (2013), who focused on trials following each other every 2 seconds. They find that 3 hidden Markov states capture the probe transitions the best, and interpret this similar to the theory proposed by Mittner et al. (2016). This means, as an on-task state, an off-focus state featuring attentional lapses, and an active mind wandering state. All three states had very high probabilities of transitioning back towards itself (around .95). Interestingly, a transition from on-task to active mind wandering almost never occurred, as first the off-focus state was visited, but transitioning from mind wandering directly to on-task was more likely than transitioning through off-focus. Response time variability patterns followed a natural order through these states, with on-task showing the least variability and active mind wandering the most.

All research discussed so far relies on the SART, the quintessential task to investi-gate mind wandering. There are however alternative tasks available, possibly offering addi-tional features that correlate with mind wandering. One such task is the Random Number Generation task (RNG) (Baddeley, 1966). Jahanshahi et al. (2006) showed that faster RNG

(10)

corresponds to lower measures of ’true randomness’, using several indices of randomness. Faster responses are associated with decreased attentional capacity for RNG, associating it with mind wandering. Boayue et al. (2020) designed a task combining the rhythmic tapping component of the SART with the randomness measure of RNG, called the ’Finger-Tapping Random Sequence Generation task’ (FT-RSGT). This task affords the investigation of both reaction time variability and random sequence generation simultaneously. It was shown that both features capture unique aspects of behaviour around mind wandering probes, possibly making it more suitable to investigate mind wandering than the SART.

The current paper is an attempt to leverage the power of the FT-RSGT task, the temporal element of HMMs, and the use of brain features, to provide the most accurate classification of trial-by-trial mind wandering to date. Unfortunately, no ground-truth re-garding a participants’ mind wandering state is available except for around the presented probes. Since a typical HMM cannot implement known states into its estimation procedure we made a small modification to the HMM model to allow for this, and called it the ’Probed Hidden Markov Model’ (PHMM). The first part of this thesis will explain the mathemati-cal reasoning behind the PHMM and compare its performance on simulated data with the HMM. After this, real FT-RSGT data will be presented and analyzed.

The Probed Hidden Markov Model

The current paper relies on a novel adaptation of the Hidden Markov Model (HMM) which makes use of several known state occupations in the time-series of interest, called the ’Probed Hidden Markov Model’ (PHMM). This section will first describe the likelihood function for regular HMMs, followed by the adaptation introduced in PHMMs. After this, the decoding procedure for finding the occupied states in a regular HMM will be discussed, again followed by an adaptation for the PHMM case. Finally, a simulation study will explore the accuracy of both HMM and PHMM for data sets with different effect sizes and number of features. This allows for a validation of the PHMM as outperforming the HMM. The simulated data also matches the real FT-RSGT data as closely as possible, allowing us to

(11)

validate how likely we are to find an effect.

Hidden Markov Models

HMMs are essentially mixture models with an added temporal element. In a mix-ture model, a series of data is not capmix-tured by one statistical distribution, but different data points are generated from different statistical distributions. For example, think of physi-cal strength. If we would measure the entire population on physiphysi-cal strength, we might be able to describe that data under a single Gaussian distribution. The mean would describe the grand average physical strength, and the standard deviation the measured variation in strength. We could however speculate on a model that states both children and elderly are generally weaker than the middle-aged. However, if we only measure the physical strength, we have no information to provide the required segmentation into groups that would allow us to specify a distribution for each group separately (e.g. age). This is where mixture models come in: They look at the available data without any labels, and can estimate a combination of distributions that best describes the observed patterns. Besides estimating means and standard deviations for each of the three groups, the mixture model thus also needs to estimate the proportions of each group in the sample or population.

To estimate a mixture model, there needs to be a specified likelihood function. This function describes the likelihood of finding the observed data under the mixture model with a specific set of parameters. The parameters can then be found through any optimization procedure of choice. For a Gaussian mixture model, the likelihood function looks as follows:

p(x|π, µ, σ) = N Y i=1 K X k=1 πkN (xi|µk, σk) (1)

Where N () represents the normal distribution with parameters µ and σ, N indexes the number of data points in x, and K indexes the number of states in the mixture (e.g. 3 groups, young, middle-aged and elderly). π is a parameter describing the occurrence of any

(12)

of the states, e.g. if 20% of the population is elderly, the associated πk parameter is expected

to be 0.2. For this reason, PK

k=1πk= 1.

Such a mixture model treats every point xi as independent, meaning the order of

the data vector x is irrelevant. For processes that unfold over time, such as mind wandering, we know this is not true. A mixture model is not misspecified, but the temporal ordering of data points provides additional information that can be leveraged to estimate the parameters more accurately. For every state k we can define additional parameters that describe the probability of transitioning to another state at the next time point, including a probability of staying in the same state of k. This means in total, K × K new parameters will be introduced, which will be summarized in the Transition Probability Matrix (TPM) Γ. The rows in the TPM will index the currently occupied state, and the columns will index the state that is being transitioned to in the next time step. The entries are noted as γi,j and define

the probabilities of the transitions from state i to state j for every possible combination. The chances of the time series being in any particular state are not guaranteed to be identical for every time point (this would be called stationarity). For this reason, an unobserved variable z is introduced, which keeps track of the state occupation as time progresses. z could be conceptualized as vector of length T where every entry contains the index of the state occupied at that time point. The likelihood formulated like this is computed with respect to a specific sequence of states as manifested in z. The likelihood of all the observed data and the sequence of occupied states (if these would be known) then becomes: p(x, z|π, Γ, µ, σ) = πz1 T Y t=2 γzt−1,zt T Y t=1 N (xt|µzt, σzt) (2)

Note that since we are dealing with a time series object now, the indexation over entries in x has become t. The meaning of π has become slightly different, it now defines the initial distribution, which defines the probabilities of occupying any of the K states at the start of the measured time series (i.e. t = 1).

(13)

not explicitly estimated in the fitting of the HMM. It is possible to compute the marginal likelihood of the observed data x for a specific parameter set by summing over all possible paths z. Naive implementations of this will lead to extreme computational time, but efficient algorithms exist such as the ’forward algorithm’. For more information the reader can consult Zucchini et al. (2017). In mathematical notation, marginalizing out the z variable can look like a sum of KT likelihoods over all possible states for each entry in z:

p(x|π, Γ, µ, σ) = K X z1=1 K X z2=1 ... K X zT=1 p(x, z|π, Γ, µ, σ) = K X z1=1 K X z2=1 ... K X zT=1 πz1N (x1|µz1, σz1)γz1,z2N (x2|µz2, σz2)...γzT −1,zTN (xT|µzT, σzT) (3) Introducing probes

In the traditional HMM it is assumed none of the underlying states indexed in z are known, hence the marginalization over states sums over all T entries in z. In some situations, several state occupations might be known. Applied to the case of mind wandering, after a trial a participant might be asked about their attentional focus, and they can report if they were on-task or off-task. The state occupation of the relevant trial is then known. These known states can be summarized in the variable p, a vector with P entries, each of which indexes the known state occupation of a specific trial. The trials for which we know the state through p can be dropped from the unknown variable z, making z a vector of length T − P . Imagine for instance, in a time series of T = 5 we have probed the second and fourth time point and know they occupy respectively state ’2’ and state ’1’. This gives us the values p1 = 2 and p2 = 1. In that case, the marginalized likelihood equation can be written as:

p(x, p|π, Γ, µ, σ) = K X z1=1 K X z2=1 K X z3=1 πz1N (x1|µz1, σz1)γz1,p1N (x2|µp1, σp1) γp1,z2N (x3|µz2, σz2)γz2,p2N (x4|µp2, σp2)γp2,z3N (x5|µz3, σz3) (4)

(14)

state 2, are used to calculate its associated density under the Gaussian, and similarly for x4, parameters for p2 associated with state 1 are used. The transition probabilities to and

from this state are still marginalized over, as far as the previous and following state are unknown. The logic of equation 4 extends to as many probes as one wishes to incorporate. The model implemented through this likelihood equation can be called the ’Probed Hidden Markov Model’ (PHMM). Note that the indexing of the hidden states is now not aligned anymore with the indexing over time, which is preserved in the indexing of x.

State sequence decoding

Since the parameters of the HMM are optimized while marginalizing out the state occupations, after fitting an HMM, the most likely sequence of states is not immediately available. Special procedures exist for finding the optimal state sequence, given the fitted parameters. Most yield similar but slightly different results. The most common method for ’global decoding’ (i.e. finding the most optimal sequence of states for the entire time series), is called ’Viterbi decoding’. In essence, Viterbi decoding is an efficient algorithm for finding p(Z = z|X = x; π, Γ, µ, σ), the most optimal sequence of states given the observed data. It starts by defining ξ1,k = πkN (x1|µk, σk) with k indexing the different

states. Then, for t = 2, 3, ..., T , ξt,k = (max

i (ξt−1,iγi,k))N (xt|µk, σk). This keeps track of the

likelihood of the most likely sequence of states to end up in state k, for every time point t, creating a T × K matrix of values. From this matrix, the most likely sequence of states can be decoded reasoning backwards. First, set kT = argmax

i=1,...,K

(ξT ,i). Then, backwards from

t = T − 1, T − 2, ..., 1, kt= argmax i=1,...,K

(ξt,iγk,kt+1). In case of known states as encoded in p, the

known time points can be skipped, which can be implemented by setting all other states at that time point to likelihood 0 (i.e. ξt,i6=p= 0).

Simulation study

To investigate the accuracy of the PHMM and contrast it with the HMM, we simulated data and evaluated performance with regard to several parameter settings. The

(15)

characteristics of the data were chosen to match the real data collected and discussed for the FT-RSGT later in this paper. The results from these simulations can thus also function as a type of ’power analysis’ for these real data.

The data All simulated data contained two possible hidden states. One was sup-posed to represent on-task focus, while the other represented off-task focus. The data was simulated from a multivariate Gaussian emission distribution, so for each time point, several data points were generated. Each of these components has its own mean, variance, and covariance with the other components. The variance for each component was always fixed as one, while the covariances with other components were randomly drawn from a Beta distri-bution with parameters α = 1 and β = 10. The transition probabilities were also fixed, with the probability of staying in the same state always being 0.8, and 0.2 for transitioning to the other state. In total, 18 blocks of data were simulated. On average, there were 70 trials per block, but this was varied using a Poisson distribution with λ = 4 randomly added (45% of blocks) or subtracted (45% of blocks) from each block. On the remaining 10% of blocks, this modification was omitted and the default length of 70 trials was used. For the entire length of the data, a hidden Markov Chain was generated following the discussed transition probability structure, which was initialized in state 1 in the first trial. The simulated probes were read out from this Markov Chain at the end of each block. This gives us 18 probes for on average 1260 trials, which is relatively few probes. While it might be interesting to test the performance of PHMM at varying numbers of probes relative to unknown trials, our real data set of interest contained a fix number of probes in congruence with this simulation, hence the number of probes was not varied.

The data was varied along two dimensions: Firstly, the number of components of the multivariate Gaussian was varied. It is expected that adding more components will initially improve the fitting, as long as the components contribute unique variation. More unique components means more data is available to disambiguate the hidden states. However, at some point, adding more components will increase the parameter space by too much. This

(16)

yields complications with finding the maximum likelihood, since optimization in very high dimensional spaces is technically problematic. The range of component numbers we are interested in is from 2 to 6, and to test for potential in fitting higher dimensional spaces, we also included 10.

The second parameter along which the simulated data is varied, is the means of components. To be specific, the difference in means between state 1 and 2, which we will refer to as ’effect size’. In this case, the means of all Gaussians in state 1 were fixed to 0, while the mean of state 2 would vary following the relevant effect size. For example, if the effect size is said to be 0.6, any components mean in state 1 will be 0, and in state 2 it will be 0.6 (or -0.6, as it is randomly drawn whether a component increases or decreases by this effect size). As a control condition, we included simulations with effect size 0. In this special case, the means would not provide an opportunity for the HMM to distinguish between the states. The only minor difference will be the covariance structure, which we do not expect to get picked up by the HMM nor the PHMM. The total set of effect sizes we simulated was 0, 0.2, 0.4, 0.6, 0.8 and 1.

The optimization For each unique combination of effect size and number of com-ponents, three data sets were simulated. This allowed us to inspect the consistency of fitting with the same parameter settings but slightly different randomly generated data, making the simulation more robust. For every data set, we fitted the HMM and PHMM respectively 25 times, to estimate the consistency of the results. This also allowed us to compare the best recovered fits. The procedure we used to fit was the default R optim set to a simulated annealing (SANN) optimizer which implements the algorithm described by B´elisle (1992). The benefit of SANN is that it is good at exploring likelihood functions with multiple local minima, although at the cost of longer optimization time. The variance of new proposals for parameter sets starts highly variable and decreases as the internal ’temperature’ parameter decreases over time. This way the optimizer can detect the presence of multiple convexities first, and then start local optimization for whichever seems most promising. In HMMs and

(17)

mixture models this is important, since a local minimum is expected when the parameters approach the global mean and variance. We wish to avoid optimizing this minimum, and instead optimize a minimum where two states with different means and variances are found. The SANN optimization was run for 10000 iterations every fit. For each of the 25 fitting procedures, all means were initialized randomly according to a Gaussian distribution with mean being the true mean and variance 1. The covariance matrices for both states were initialized as the analytic covariance matrix for the entire data set. The transition proba-bilities were initialized as 0.8 for self-transitions and 0.2 for transitioning to the other state, mimicking the notion that we will have some idea about the transition structure for our real FT-RSGT data set (i.e. high probabilities of staying in the same state).

The recovery of state sequences is considered the most important metric in our study, being of higher relevance than the accuracy of the individual means and covariance matrices per state, or the exact transition probabilities. This is because the state sequence decoding captures the effects of all of these estimated parameters to some extent, providing a good overall metric of performance. In general we would apply the regular Viterbi decoding procedure to the vanilla HMM fit, and the modified probed Viterbi decoding procedure to the PHMM fit. However, to avoid being unclear about the benefit of computing the likelihood of the PHMM during optimization, we also applied the probed Viterbi decoding procedure to the vanilla HMM. Otherwise, any improvements due to probing might be attributed to the Viterbi decoding procedure, as opposed to the parameter recovery using SANN. The point of this simulation is mainly to show the benefits of the PHMM for this parameter recovery procedure, not during the later decoding. The decoded PHMM will be referred to as ’probed’, the vanilla HMM decoded using normal Viterbi decoding as ’Viterbi’, and the vanilla HMM decoded using the probed Viterbi algorithm will be called ’forced’ (since it is forced to follow a state sequence through the probed states).

Results The recovered fits with the best likelihoods are displayed in Figure 2, where the outcome measure is the proportion of accurately recovered hidden states. For

(18)

the vanilla HMM, recovered hidden states were matched to the simulated hidden states based on accuracy, so performance below 0.5 is impossible. For the PHMM, state labelling should happen automatically through the induction of probes during fitting, so no post-hoc assignment was necessary. All inspected PHMM fits performed better than 0.5, so it is assumed this permutation is correct. It can be seen that with an increase in effect size, a clear increase in accuracy is shown in both the HMM and the PHMM. Since the same optimization procedure was used for both models, both show that they start to break down around 6 used features. This is manifested by the line becoming unstable, and not displaying a linear increase in accuracy with effect size. It is still possible to get good fits, but the optimization holds less guarantees, which is seen in the variability over data sets. For data set 3, it seems decent fits were found, but for 1 and 2 our optimization procedure seems lacking. This is likely because the same number of iterations were allowed in any case. Models with less parameters have a less complex space to explore and can thus find a better fit in the same amount of time. Introducing more features also introduces combinatorially more parameters,

Figure 2: This plot shows the best proportion of correctly recovered states for different simulation parameters based on the likelihood. The facets display different numbers of features used over the columns, and different unique data sets over rows. The x-axis of the plots visualizes the effect size.

(19)

mainly in terms of covariances. Tweaking the optimization procedure and allowing longer computational time might result in better fits of models with more parameters.

The advantage of PHMM over HMM in Figure 2 is subtle but clear. However, the number of probes we simulated is very small compared to the number of unknown trials, which realistically mimics our data. With relatively more probes, the PHMM is expected to perform increasingly better. The number of probes relative to unknown trials is on average 1 to 70. The clearest benefit of the PHMM can be seen in moderate numbers of features, around moderate effect sizes (e.g. 4-5 features, around an effect size of 0.5-0.6). Here, PHMM consistently finds better solutions that can have as much as 10-20% more correctly classified trials. This is beneficial, since this is the range of parameter settings we are mostly interested in, as it is expected to match the characteristics of the FT-RSGT data.

Having discussed the expected performance of the PHMM and showing that it is typically equivalent or superior to the HMM, we can now discuss the characteristics of the FT-RSGT data set, and the results found applying the PHMM.

Methods

Participants

Ethical approval was obtained from the ethics review board of the University of Amsterdam. 27 healthy volunteers (15 males, aged M = 27.5, SD = 7.2) participated and underwent screening for MRI compatibility and safety. Participants were excluded if they had a record of neurological or psychiatric disease, impaired vision, or any contra-indication for MRI such as medical implants or prostheses. All participants provided written consent and were compensated for their participation. Two participants were excluded from the analysis due to abnormalities in their fMRI data sets.

Experimental Task

Participants performed a modified version of the keyboard random number gener-ation task for measuring executive control (Baddeley, 1998). Randomness in such tasks has

(20)

been shown to decrease with increasing frequency of task unrelated thoughts (Teasdale et al., 1995). Participants had to press one of two buttons rhythmically entrained to an auditory stimulus tone (440Hz) which was presented in a consistent pacing of 750ms (inter-stimulus-interval, ISI). Such rhythmic entrainment has also been shown to decrease with increasing prevalence of task unrelated thoughts (Kucyi et al., 2017). The RNG component of the task is for participants to randomize the order of left and right presses as much as possible.

Participants were presented 75 tones for ’practice’ and to identify any issues with the equipment. During the experiment each participant performed 27 blocks containing between 74 and 87 tones to which they were expected to match a button press. This means a block can take between 55 and 65 seconds. After each block a probe appeared, where the participant had to rate their current task focus on a 6 point Likert scale ranging from ’clearly on task’ to ’clearly off task’. Each block could be one of two conditions, a ’random’ condition in which the participant had to produce random sequences as expected in the RNG task, or an ’alternating’ condition, in which the participant had to switch between pressing left and right every trial. The blocks were randomized in triplets, with every triplet containing one alternating and two random conditions. The order in which these were presented was random. This means in total, 9 blocks were alternating, with the remaining 18 blocks being random. During each block, the participants had to keep their focus on a cross in the middle of the screen. The probe appeared on this screen at the end of every block, indicating the end of the block. After the probe, an instruction screen appeared which would signal the start of the next block, including the type of response that was expected (i.e. random or alternating).

Functional neuroimaging

Acquisition Participants performed the task in a 3 Tesla Philips Achieva MRI system with a 32-channel head coil. T1-weighted images were acquired with a fast echo (FE) turbo field-echo (TFE) sequence in 257 saggital slices with 0.7 mm slice thickness (FOV = 256×240×180 mm, TR = 11 ms, TE = 5.1 ms, voxel size = 0.7 mm isotropic,

(21)

flip-angle = 8◦). Whole-brain functional images were acquired with multi-gradient echo EPI in 56 transverse slices with 2 mm thickness and a 0.2 mm slice gap (FOV = 224×224×123 mm, TR = 1800 ms, TE = 30 ms, flip-angle = 55◦, voxel size = 2mm isotropic).

Preprocessing A total of 2 T1-weighted (T1w) images were found within the input BIDS dataset. All of them were corrected for intensity non-uniformity (INU) us-ing N4BiasFieldCorrection (Tustison et al., 2010, ANTs 2.2.0). A T1w-reference map was computed after registration of 2 T1w images (after INU-correction) using mri robust template (FreeSurfer 6.0.1, Reuter et al., 2010). The T1w-reference was then skull-stripped using antsBrainExtraction.sh (ANTs 2.2.0), using OASIS as target template. Spatial normal-ization to the ICBM 152 Nonlinear Asymmetrical template version 2009c (Fonov et al., 2009, RRID:SCR 008796) was performed through nonlinear registration with antsRegistration (ANTs 2.2.0, RRID:SCR 004757, Avants et al., 2008), using brain-extracted versions of both T1w volume and template. Brain tissue segmentation of cerebrospinal fluid (CSF), white-matter (WM) and gray-white-matter (GM) was performed on the brain-extracted T1w using fast (FSL 5.0.9, RRID:SCR 002823, Zhang et al., 2001).

First, a reference volume and its skull-stripped version were generated using a custom methodology of fMRIPrep. A deformation field to correct for susceptibility dis-tortions was estimated based on two echo-planar imaging (EPI) references with opposing phase-encoding directions, using 3dQwarp (AFNI, Cox & Hyde, 1997). Based on the es-timated susceptibility distortion, an unwarped BOLD reference was calculated for a more accurate registration with the anatomical reference. The BOLD reference was then co-registered to the T1w reference using flirt (FSL 5.0.9, Jenkinson & Smith, 2001) with the boundary-based registration (Greve & Fischl, 2009) cost-function. Co-registration was configured with nine degrees of freedom to account for distortions remaining in the BOLD reference. Head-motion parameters with respect to the BOLD reference (transformation matrices, and six corresponding rotation and translation parameters) are estimated before any spatiotemporal filtering using mcflirt (FSL 5.0.9, Jenkinson et al., 2002). The BOLD

(22)

time-series (including slice-timing correction when applied) were resampled onto their origi-nal, native space by applying a single, composite transform to correct for head-motion and susceptibility distortions. These resampled BOLD time-series will be referred to as prepro-cessed BOLD in original space, or just preproprepro-cessed BOLD. The BOLD time-series were resampled to MNI152NLin2009cAsym standard space, generating a preprocessed BOLD run in MNI152NLin2009cAsym space. Several confounding time-series were calculated based on the preprocessed BOLD: framewise displacement (FD), DVARS and three region-wise global signals. FD and DVARS are calculated for each functional run, both using their implementa-tions in Nipype (following the definiimplementa-tions by Power et al. (2014)). The three global signals are extracted within the CSF, the WM, and the whole-brain masks. Additionally, a set of phys-iological regressors were extracted to allow for component-based noise correction (CompCor, Behzadi et al., 2007). Principal components are estimated after high-pass filtering the prepro-cessed BOLD time-series (using a discrete cosine filter with 128s cut-off) for the two CompCor variants: temporal (tCompCor) and anatomical (aCompCor). Six tCompCor components are then calculated from the top 5% variable voxels within a mask covering the subcortical re-gions. This subcortical mask is obtained by heavily eroding the brain mask, which ensures it does not include cortical GM regions. For aCompCor, six components are calculated within the intersection of the aforementioned mask and the union of CSF and WM masks calculated in T1w space, after their projection to the native space of each functional run (using the inverse BOLD-to-T1w transformation). The head-motion estimates calculated in the correc-tion step were also placed within the corresponding confounds file. All resamplings can be performed with a single interpolation step by composing all the pertinent transformations (i.e. head-motion transform matrices, susceptibility distortion correction when available, and co-registrations to anatomical and template spaces). Gridded (volumetric) resamplings were performed using antsApplyTransforms (ANTs), configured with Lanczos interpolation to minimize the smoothing effects of other kernels (Lanczos, 1964). Non-gridded (surface) resamplings were performed using mri vol2surf (FreeSurfer). Scanner drift was regressed

(23)

out separately for every participant as a final step, using a quadratic least-squares linear regression.

Constrasts Regions of Interest (ROIs) for further analysis and to use as features in the PHMM were first functionally demarcated. A contrast analysis was set up between brain activity on random conditions vs alternating conditions, using a boxcar blocked design respectively over all relevant blocks. We added additional nuisance regressors. Task nuisance regressors were the onset of a metronome stimulus, separate regressors for a right and a left finger button press, and the onset of a probe screen. Physiological and noise regressors included the CSF, WM, framewise displacement, and 6 head motion regressors (movement and rotation in 3 dimensions). In total there were 13 nuisance regressors and 2 regressors of interest (boxcars for alternating and random condition). This model was first fitted per participant using a typical GLM. Mixed effects for participants were then estimated using FLAMEO-flame1 (Woolrich et al., 2004). The smoothness of the resulting statistical map was calculated using FSL and cluster-thresholded for a p-value of 0.05 and a z-statistic of 2.3, leaving us with the ROIs discussed in the results.

Behavioural analysis

Two aspects of the FT-RSGT behavioural data were examined in relation to at-tentional focus. For these measures, only the ’random’ conditions of the experiment were of interest. Firstly, we calculated the ’behavioural variability’ (BV) of tapping times or ’inter-stimulus interval’ (ISI), calculated as the standard deviation of ISI over a particular window, divided by the mean ISI over that window. For every tap in a block, BV was calculated over that tap and every tap for 15 seconds before it. With the tones being presented every 750ms, it is expected on average each window will include 20 taps. The first 9 taps of every block were excluded, after that BV was calculated for smaller windows if not enough taps were available (i.e. minimum window size of 10). This same window size was used to calculate the ’Approximate Entropy’ (ApEn) (Pincus & Kalman, 1997). ApEn was calculated not over ISIs but over the sequence of tapping responses (i.e. ’left’ vs ’right’). ApEn indexes the

(24)

’randomness’ of a sequence by reflecting the likelihood that ’similar patterns of observations will not be followed by additional similar observations’ (Ho et al., 1997). A higher ApEn thus means increased randomness. A pilot study by Boayue et al. (2020) revealed that the ApEn parameter setting m = 2 yields the most stable results, hence it is used in this study.

Feature extraction

For the PHMM modeling, we extracted brain features in a data-driven manner. For each contrast (random > alternating & alternating > random) the activity of the most representative ROI was chosen as feature. This should yield one brain feature related to proper task engagement, and one related to mind wandering, as the random condition is associated with a higher degree of executive control, and the alternating condition is associ-ated with more frequent mind wandering. The ROI activity was calculassoci-ated as the average activity of all voxels in the relevant cluster as identified in the contrast, linearly interpolated between the two closest TRs for every trial. Additionally, the dynamic functional connec-tivity of these regions was estimated, previously shown to co-vary with finger tapping tasks and mind wandering (Kucyi et al., 2017), giving a total of 3 brain features. The dynamic functional connectivity is calculated as the correlation between the two brain features over the past 20 trials. The behavioural features were used as processed before. Every feature was standardized over all participants at once to maintain individual differences.

Results

Contrast ROIs

For the alternating > random condition, only two large ROIs showed up in the contrast analysis. These were the left angular gyrus (lAG) and the posterior cingulate cortex (PCC), whose location in several slices of MNI space is shown on the top of Figure 3. Since the PCC is the largest region and is theoretically often considered a core node of the DMN, this ROI was chosen for further analysis. For the random > alternating condition,

(25)

Figure 3: Three slices of an MNI brain that best represent the ROIs found in the contrasts. On the top are the results from the alternating > random contrast. On the bottom are the results from the random > alternating contrast.

significant activity was more diffused over the brain. A total of 8 regions survived cluster thresholding, being the right insula (rINS), left insula (lINS), right cerebellum (rCB), left cerebellum (lCB), supplementary motor area (SMA), right superior parietal cortex (rSPC), right dorsolateral prefrontal cortex (rDLPFC), and the right supramarginal gyrus (rSMG). The DLPFC specifically has been named before in executive processes such as working memory and, most interestingly, random number generation (Jahanshahi et al., 2000). Not the full rDLPFC functionally activates in the current contrast, rather it is a restricted region, as can be seen on the bottom in Figure 3. This restricted rDLPFC ROI is used for further analysis.

Probe analysis

As a preliminary to the PHMM modeling, we conducted a Bayesian hierarchical ordered probit regression of the behavioural variables -BV and ApEn- on the probe responses for the random condition blocks. The model was fit using Stan through the R-package brms (Bayesian Regression Models using Stan; B¨urkner, 2017). Figure 4 shows the coefficients and their 95% confidence intervals. In this model, the parameter for the (logistically transformed) BV was positive (b=0.29 [0.16, 0.42], ER+=3999.0), meaning that when the BV increased,

(26)

participants were more likely to respond with higher probes (i.e. be mind wandering). The ApEn effect was in the expected negative direction (b=-0.09 [-0.22, 0.03], ER−=13.03), but

the 95% CI included 0, so it cannot be definitively said that as ApEn decreases, participants are more likely to respond with higher probes. The interaction between BV and ApEn was in the expected positive direction (b=0.07 [-0.05, 0.20], ER+=7.11), but also included 0 within

the 95% CI.

Hidden Markov Modeling

The fitting of PHMMs for the FT-RSGT data followed the exact same fitting and initialization procedure as discussed for the simulation study earlier. The best out of 25 fits was kept for every participant. Figure 5 shows the distributions of fitted means for each participant for the two states. The lines indicate the within-participant differences of means between the two states. Separate pairwise t-tests were conducted to inspect whether these differences per state were significant for each variable. For BV we found a significant difference between state 1 (M=-0.231, SD=0.387) and state 2 (M=0.425, SD=0.525) of the PHMM (t(24)=3.920, p<.001). Approximate Entropy showed a difference between state 1 (M=-0.115, SD=0.263) and state 2 (M=0.142, SD=0.303), opposite of what we expected with

Figure 4: Estimated coefficients of a probit regression model predicting the participants probe responses from their recorded BV and ApEn in a 20 second window before the probe onset.

(27)

respect to BV (t(24)=2.435, p=.023), but after Bonferroni correction for multiple compar-isons, the p-value does not exceed the critical α = .01. The dynamic functional connectivity between the rDLPFC and the PCC showed a significant effect between state 1 (M=-0.272, SD=0.412) and state 2 (M=0.401, SD=0.518) as well (t(24)=3.825, p<.001). However, the Shapiro-Wilk test indicated that the assumption of normality seems violated for this fea-ture (W =0.851, p=.002). For all other feafea-tures, this did not seem to be a problem. The regional activities of the rDLPFC and PCC show nothing to indicate a trend of differences between the two states. For rDLPFC, activity in state 1 (M=0.021, SD=0.235) and state 2 (M=0.049, SD=0.289) are very similar (t(24)=0.292, p=.773). This absence of effect is similar in the PCC for state 1 (M=-0.012, SD=0.199) and state 2 (M=0.011, SD=0.287), not being significant (t(24)=0.267, p=.792).

Figure 6 shows a plot of the recovered covariance matrix parameters for all partic-ipants, and their differences between the two states. No statistical tests were performed on

Figure 5: Distribution of participant means for each variable. Grey lines indicate within-participant changes of the estimated mean between the two states.

(28)

Figure 6: Distribution of participan t v ariances (diagonal) and co v ariances (off-diagonal). Grey lines indicate the within -participan t changes of the estimated parameter b et w een the tw o states.

(29)

these values, as they were not part of any prespecified hypothesis. As an exploratory result, we note that the difference in covariance between BV and ApEn in the on-task state is not significantly different from 0 (M=-0.004, SD=0.119, t(24)=-0.151, p=.882), while in the mind wandering state, the covariance is significantly less than zero (M=-0.178, SD=0.345, t(24)=-2.583, p=.016), tested using a two-sided one-sample t-test against µ=0.

To inspect if the temporal structure of the HMM yields a significant contribution to the model fitting, the HMMs are compared to a Multivariate Gaussian Mixture Model and a simple multivariate Gaussian fit using the Bayesian Information Criterion (BIC). The Mixture Models were fit using the R package mixtools (Benaglia et al., 2009). For each participant the EM-algorithm used to estimate the mixture model was initialized 25 times, and ultimately the fit with the largest likelihood was kept. The single multivariate Gaussian was fit using the typical analytical estimation. BIC preferred the PHMM over the other models for every participant. The mixture model was preferred over the single

Figure 7: BIC values for the Hidden Markov Model, multivariate Gaussian mixture model, and single multivariate Gaussian model for every participant. Lower values mean the model is preferred over those with higher values. Since data sets differ per participant, comparisons between participants are not possible.

(30)

multivariate Gaussian for every participant. Figure 7 shows the BIC values for the models for every participant. Note that comparisons between participants are not possible, since every participant has a data set with slightly different characteristics (e.g. nr of trials).

The state occupation time for HMMs follows a geometric distribution, with the av-erage dwell time per state being the expected number of failures of the geometric distribution with success parameter p coded as the total probability of switching away from that state. Assuming one trial lasts 750ms (the interval between metronome tones), the average on-task dwell time recovered by the PHMM is 13.105 seconds. For mind wandering, the average dwell time is 9.970 seconds. Longer on-task dwell times than mind wandering dwell times is in agreement with Bastian & Sackur (2013) who find an average 18.2 seconds on-task dwell time, and 11.1 seconds mind wandering dwell time.

Discussion

This paper attempted to perform unsupervised clustering on behavioural and neu-ral data in order to separate on-task and off-task attentional states. It is shown that the data collected from a finger-tapping random sequence generation task is best described as coming from two separate states that have high temporal autocorrelation. Especially reaction time variability and functional connectivity between the PCC and rDLPFC can be clustered as being drawn from two very different distributions over time. These findings are not surpris-ing, since reaction time variability has already been shown to vary strongly when participants engage in mind wandering (Kucyi et al., 2017) and has even already been clustered using a two-state HMM (Bastian & Sackur, 2013). The fact that functional connectivity seems more indicative of mind wandering than ROI activity has also been shown before (Kucyi et al., 2017; Groot et al., 2020). What is contradictory to our expectations and earlier findings reported by Boayue et al. (2020), is that ApEn is significantly higher in the state associated with higher BV, which we would link to mind wandering. This effect is also reversed in our data when only a subset of trials before each probe is considered, although the effect for ApEn is not significant in this case. Somehow, the PHMM identifies a state where BV and

(31)

ApEn are both high.

Since originally mind wandering was associated with increased activity in the DMN and decreased activity in an anticorrelated network (ACN), consisting of mainly frontal con-trol regions, it can be counter-intuitive to associate the mind wandering state with increased functional coupling between the PCC, a core node of the DMN, and the rDLPFC, a region in-volved in the fronto-parietal control network (FPCN), typically seen as a subset of this ACN. Higher coupling between the DMN and ACN has been found before during mind wandering however (Mittner et al., 2014). Groot et al. (2020) even specifically find functional con-nectivity between PCC and rDLPFC increasing when participants are off-task. This makes sense if we assume that participants can still engage in effortful mental control while they are mind wandering, only the control is not directed at the task at hand but to some personally relevant internal goal. Similar goal-directed activity and engagement within the rDLPFC and PCC can explain why there is no significant difference between these two nodes within the random condition of the experiment, between the identified on-task and mind wandering states. It is however more difficult to explain why these regions then show up as significant ROIs in the contrast between random and alternating conditions. Not finding an increase of activity in the PCC, representing a core DMN node, during mind wandering, has also been seen before by Yamashita et al. (2020), who take an approach similar to the current paper, where segregation between attentional states is not purely based on probes or behavioural performance, but also on identifiable brain-states. In fact, their segregation is based purely on brain-states. They argue that from their findings, DMN can actually be associated with behaviourally optimal states.

One limitation of the current study is the forced limitation of two attentional states. As discussed by Mittner et al. (2016), there are good reasons to believe the brain has ’switch-ing mechanisms’ that are more complex than a simple dichotomy between on- and off-task. Improved fits and theoretical results may appear when the PHMM is fitted using multiple states, selecting the best one. This links to a practical limitation of the current analysis:

(32)

As discussed in the PHMM method section, the selected optimizer does not perform opti-mally under the conditions of the collected finger-tapping random sequence generation task. Better fitting procedures might exist, leading to more consistent results and allowing more complicated models encompassing more neural features. In limiting ourselves to only the PCC and rDLPFC, we might miss critical interactions and activations during both on- and off-task cognition.

Future work

In the current paper, the PHMM was not used as an informative statistical test far above and beyond the results that were found using the probit regression. This is because the setup was mainly confirmatory: The PHMM should roughly agree with the probit regression results to verify that it works as intended. However, we found results for ApEn opposite to our expectations. ApEn should be higher for on-task states compared to mind wandering states, but the PHMM identifies a significant difference in the other direction. However, as shown by the probit regression for probe-windowed data, there is a trend towards an inter-action between BV and ApEn. In the exploratory visualization of the recovered covariance

Figure 8: A graphical representation of the tripartite model of mind wandering. Participants can switch between the on-task and the off-focus state, or between the mind wandering and the off-focus state. The off-focus state is hard to assess introspectively, and the idea is that the participants will label this state similarly to mind wandering.

(33)

matrix, we can also see there is no covariance between BV and ApEn in the on-task state, but there appears to be negative covariance between the two in the mind wandering state. Negative covariance between BV and ApEn is theoretically expected, but it seems this ex-pectation is not captured by the recovered means between the two states. Rather, this effect remains ’hidden’ in the mind wandering state. Considering the earlier discussed tripartite model, as shown in Figure 8, there are actually 2 possible states of non-optimal performance, being the ’off-focus’ state and an active mind wandering state. The PHMM might confuse and collapse these states into one underlying state, (the mind wandering state, i.e. state 2), since we forced a two-state partition. It is expected that ApEn does not suffer much in the off-focus state necessarily, since the participants task-set could still be focused on random sequence generation. In active mind wandering however, ApEn is supposed decrease signifi-cantly. This explains the found negative covariance between BV and ApEn within state 2, as opposed to a mean-difference between states 1 and 2, for ApEn. It also explains larger vari-ance in BV in state 2, since off-focus is associated with larger BV, but active mind wandering is associated with repetitive responses that can be very accurate in timing, but decoupled from the actual task (i.e. random sequence generation as indexed by ApEn) (Hawkins et al., 2019). The probit regression model might capture such nuances better, since it models the raw probe response values (a 6-point Likert scale). Given, this cannot intuitively explain why there is a significant ApEn effect in the opposite direction in the current PHMM fit, but since the generative model represented by the PHMM in the current study would be misspecified, it can be unclear exactly what the PHMM ’latches onto’ to partition the data into two discrete states.

To allow the PHMM to identify more than two states, and to more accurately represent the tripartite theory in its statistics, we propose several constraints and extensions to the model. Consider the more detailed graphical representation of the tripartite model as shown in Figure 9. It shows that when ’zoomed in’, actually the off-focus state can have two different signatures. The point is, that behaviour in the off-focus state will be more

(34)

task appropriate when transitioning into it from an on-task state, than when transitioning into it from a mind wandering state. Active task-set will be different dependent on recent cognitive history, which is resembled in the ApEn values. ApEn is expected to still be higher for the on-task into off-focus transition, while ApEn is expected to be lower for the mind wandering into off-focus transition. Behavioural variability is related to goal focus: Participants will respond in consistent patterns in the on-task state since they are focused on the task. However, when participants engage in active mind-wandering, their responses are expected to become rhythmic as well (Hawkins et al., 2019), however this rhythmic response will not be task appropriate, hence have low ApEn. This separation of behavioural patterns within the off-focus state, leads to two possible ’transition paths’ as visualized in Figure 9 by the differently colored arrows. The green arrows symbolize the unique transitions that are possible coming from an on-task state. Note that the off-focus state associated with mind wandering cannot be reached directly from the on-task state. Vice versa, the off-focus state associated with on-task responses cannot be reached directly from the mind wandering state. Cyclical transitions between on-task and off-focus can be considered transient lapses

Figure 9: A graphical representation of an extended tripartite model of mind wandering. The off-focus state has been split into two distinct states, since task-related variables such as Approximate Entropy are expected to be higher when participants lose their previous on-task goal focus, than when they lose their previous mind wandering related goal focus. For this reason, transitions between on-task and the mind wandering related off-focus state are not possible, as well as the other way around.

(35)

of attention as described by Cheyne, Solman, et al. (2009). They might represent a brief loss of goal focus which gets appropriately resolved in a short amount of time.

To implement the fact that transitions between states are constrained, i.e. no transitions between on-task and mind wandering states and their related off-focus states are possible, the tripartite PHMM will have a constrained transition probability matrix. For a two-state HMM, the transitions can be modeled using 2 parameters (τ1 & τ2), since the

other transitions can be calculated as 1 − τ . for the tripartite model, this logic holds for the on-task and the mind-wandering state. Here, only e.g. the recurrent transition has to be estimated (τot & τmw). The transitions to the appropriate off-focus state can then be

calculated as 1 − τ , while the other transitions are fixed to 0. For the off-focus state, 2 separate transition probabilities need to be estimated, i.e. the transitions to on-task and mind wandering. The self-transition can then be calculated as the remainder to 1.

There exists uncertainty regarding the probe responses when the model of mind wandering is extended beyond two states. Participants might not have accurate introspection to report whether they are truly on-task, lapsing in attention to off-focus, or fully mind wandering. They can only indicate a general direction, meaning the probe is informative but not decisive. To implement this uncertainty regarding the probes, we propose to add 2 parameters to the model: An o parameter and an m parameter. These parameters should estimate the proportion of probed trials where the participant is actually in the reported state (on-task or mind wandering), vs in the off-focus state. In both cases, the proportion of off-focus states will be calculated as 1−o or 1−m respectively. For each probed trial, the state occupation will then be represented as [o, 1 − o, 0, 0] or as [0, 0, 1 − m, m] respectively, when participants indicate either on-task or mind wandering. For any probe responses that can be counted as ’indecisive’ in their general direction (on-task vs mind wandering), the state occupation will be represented as 0.5[o, 1 − o, 0, 0] + 0.5[0, 0, 1 − m, m]. The best values for o and m will be found during the optimization, like all the other relevant PHMM parameters. This tripartite modification of the PHMM can be compared to the PHMM as

(36)

de-veloped in the current paper, both with 2, 3, and 4 states, with a vanilla HMM, with the multivariate normal mixture models and with a single multivariate normal fit. This allows for careful model selection, and would reveal if the assumptions of the tripartite PHMM improve the fit, providing evidence in favour of the tripartite model. Additionally, we have specific expectations for what the transition structure should look like, specifically with re-spect to the off-focus state. The dwell-time for the off-focus state should be much lower than for the on-task and mind wandering states, since this is a transient state. Additionally, when including pupil data, the baseline pupil should be higher in this off-focus state, compared to the on-task or mind wandering states, linked to increased tonic LC activity. Ultimately, we aim to fit the extended tripartite PHMM purely on behavioural and pupil data, and to use the viterbi decoded states to inspect the neural activity on trials for every specific state. If significant patterns exist in the neural data after classification using the extended tripartite PHMM, this indirectly validates the tripartite model as such, even if the 3 or 4 state (P)HMM finds a better model fit, but does not find significant patterns in the neural data.

All things considered, the current paper shows that the PHMM framework makes sensible improvements in the estimation of HMM models, and is capable of independently recovering consistent results across multiple participants. Specifying a PHMM setup that is in agreement with the tripartite model as a theory of mind wandering, and comparing it to ’naive’ PHMM and HMM implementations, will be the next scientific step forward.

References

Andrews-Hanna, J. R., Saxe, R., & Yarkoni, T. (2014). Contributions of episodic retrieval and mentalizing to autobiographical thought: evidence from functional neuroimaging, resting-state connectivity, and fmri meta-analyses. Neuroimage, 91 , 324–335.

Avants, B. B., Epstein, C. L., Grossman, M., & Gee, J. C. (2008). Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Medical image analysis, 12 (1), 26–41.

(37)

Baddeley, A. D. (1966). The capacity for generating information by randomization. The Quarterly journal of experimental psychology, 18 (2), 119–129.

Baddeley, A. D. (1998). Random generation and the executive control of working memory. The Quarterly Journal of Experimental Psychology: Section A, 51 (4), 819–852.

Bastian, M., & Sackur, J. (2013). Mind wandering at the fingertips: automatic parsing of subjective states based on response time variability. Frontiers in Psychology, 4 , 573.

Behzadi, Y., Restom, K., Liau, J., & Liu, T. T. (2007). A component based noise correction method (compcor) for bold and perfusion based fmri. Neuroimage, 37 (1), 90–101.

B´elisle, C. J. (1992). Convergence theorems for a class of simulated annealing algorithms on d. Journal of Applied Probability, 29 (4), 885–895.

Benaglia, T., Chauveau, D., Hunter, D. R., & Young, D. (2009). mixtools: An R package for analyzing finite mixture models. Journal of Statistical Software, 32 (6), 1–29. Retrieved from http://www.jstatsoft.org/v32/i06/

Boayue, N. M., Csifcs´ak, G., Kreis, I., Schmidt, C., Finn, I. C., Vollsund, A. E., & Mittner, M. (2020). The interplay between cognitive control, behavioral variability and mind wandering: Insights from a hd-tdcs study.

Buckner, R. L., & DiNicola, L. M. (2019). The brain’s default network: updated anatomy, physiology and evolving insights. Nature Reviews Neuroscience, 20 (10), 593–608.

B¨urkner, P.-C. (2017). Advanced bayesian multilevel modeling with the r package brms. arXiv preprint arXiv:1705.11123 .

Casner, S. M., & Schooler, J. W. (2014). Thoughts in flight: Automation use and pilots’ task-related and task-unrelated thought. Human factors, 56 (3), 433–442.

(38)

Cheyne, J. A., Carriere, J. S., & Smilek, D. (2009). Absent minds and absent agents: Attention-lapse induced alienation of agency. Consciousness and Cognition, 18 (2), 481– 493.

Cheyne, J. A., Solman, G. J., Carriere, J. S., & Smilek, D. (2009). Anatomy of an error: A bidirectional state model of task engagement/disengagement and attention-related errors. Cognition, 111 (1), 98–113.

Christoff, K., Irving, Z. C., Fox, K. C., Spreng, R. N., & Andrews-Hanna, J. R. (2016). Mind-wandering as spontaneous thought: a dynamic framework. Nature Reviews Neuroscience, 17 (11), 718.

Cowley, J. A. (2013). Off task thinking types and performance decrements during simulated automobile driving. In Proceedings of the human factors and ergonomics society annual meeting (Vol. 57, pp. 1214–1218).

Cox, R. W., & Hyde, J. S. (1997). Software tools for analysis and visualization of fmri data. NMR in Biomedicine: An International Journal Devoted to the Development and Application of Magnetic Resonance In Vivo, 10 (4-5), 171–178.

Fonov, V. S., Evans, A. C., McKinstry, R. C., Almli, C., & Collins, D. (2009). Unbiased non-linear average age-appropriate brain templates from birth to adulthood. NeuroImage(47), S102.

Greicius, M. D., Krasnow, B., Reiss, A. L., & Menon, V. (2003). Functional connectivity in the resting brain: a network analysis of the default mode hypothesis. Proceedings of the National Academy of Sciences, 100 (1), 253–258.

Greve, D. N., & Fischl, B. (2009). Accurate and robust brain image alignment using boundary-based registration. Neuroimage, 48 (1), 63–72.

Referenties

GERELATEERDE DOCUMENTEN

Ethnicity was related to the dietary pattern scores (p&lt;0.01): non-Dutch children scored high on snacking and healthy pattern, whereas Turkish children scored high on full-fat

However, a conclusion from the article “On the choice between strategic alliance and merger in the airline sector: the role of strategic effects” (Barla &amp; Constantos,

The European Council agenda as a venue of high politics both followed national attention patterns and also has been leading in addressing environmental problems.. To understand

Aanleiding voor het onderzoek is de geplande verkaveling van het gebied ten noorden van de Beersebaan, die een bedreiging vormt voor eventuele archeologische resten die zich hier

Objective The objective of the project was to accompany and support 250 victims of crime during meetings with the perpetrators in the fifteen-month pilot period, spread over

Here UPR represents the variable unexpected change in the risk premium, UTS the variable unexpected change in the term structure, UI the variable unanticipated change in rate

Comparing our findings from the EC European citizenship policy goals, activities pro- moting European citizenship, the actual European citizenship level among younger Europeans, and

Coherence Filtering is an anisotropic non-linear tensor based diffusion al- gorithm for edge enhancing image filtering.. We test dif- ferent numerical schemes of the tensor