Estimating ising models on complete and incomplete psychometric data

(1)

on the proportion of failed estimations. The simulation study suggested that the log-linear method(Epskamp, Maris, Waldorp, & Borsboom, 2015) is superior in networks with less than 15 nodes and the eLasso method (Van Borkulo et al., 2014) in networks with more nodes. The second goal was to find the best imputation method to estimate an Ising model based on incomplete data. In an imputation simulation study we compared the result of four imputation methods on estimating network and threshold parameters in networks with various proportions of missing values. We propose and review two repair mechanisms that are applied during the imputation to prevent overfitting. The best missing data imputation results were obtained with the univariate logistic regression method coupled with the ThresDiff repair mechanism. We conclude by illustrating missing data imputation for Ising models with a real world data exam-ple from the student practice and monitoring system Math Garden (Klinkenberg, Straatemeier, & Van Der Maas, 2011).

Keywords:Ising Model, Incomplete Data, eLasso, Full-Data-Information approach, Ising Model Estimation, Imputation

Introduction

In recent years, network models have become increasingly popular in psychological research. Network models represent relationships between directly interacting variables and they are created based on subjects’ responses. They are strong in showing relationships between constructs and have been ap-plied in psychopathology (Cramer, Waldorp, Van Der Maas, & Borsboom, 2010; Borsboom & Cramer, 2013; Ruzzano, Borsboom, & Geurts, 2015), in intelligence research (Van Der Maas et al., 2006) and in ability research (Marsman, Maris, Bechger, & Glas, 2015). For example in psychopathol-ogy, the symptoms of a disorder can be represented by nodes in a network. The association between two symptoms, or the co-occurrence of two symptoms, is defined by an edge between them. A psychopathological disorder is then repre-sented by a strongly connected cluster of symptoms that is associated with the disorder.

A type of network with observed variables, e.g., symptoms as nodes and the association between them as (undirected) edges is called a Markov Random Field (Kindermann, Snell, et al., 1980). The key element of a network to be classified as a Markov Random Field (MRF) is the Markov property, which states that when two nodes are conditionally indepen-dent given all other nodes in the network, they should not be connected by an edge (Kindermann et al., 1980). In this pa-per, we apply a network model to binary variables. Psycho-logical research knows many binary variables as, for example, the presence/absence of a symptom in psychopathology, the correct/incorrect answer on an item in test-theory, or whether a subject is a member of a group or not. The Ising model is a

popular MRF network model that is used in modelling binary variables because it captures both the pairwise interactions be-tween variables as well as their main effects. This thesis will focus on the estimation of the Ising model on complete and incomplete data, and its application.

An Ising model can only be estimated if data are complete, i.e., there cannot be missing values in the data. In many sci-ences, and certainly in psychology, having incomplete data is a well-known phenomenon. For example, surveys that have not been filled in completely, measurement occasions on which subjects were absent, data from (online) environments that are difficult to control, or data that stem from an adaptive sys-tem like Math Garden (MG; Klinkenberg et al., 2011) that systematically contain missing values. In these cases, the missing values have to be imputed because a complete data set has to be available to estimate an Ising model. In this thesis, we will present an iterative procedure to enable re-searchers to impute missing values in the best possible man-ner. Previous literature about missing data imputation in Ising models networks is scarce if not absent, but there are stud-ies about data imputation in other network models. Huisman (2009) studied missing data in unweighted, directed and undi-rected social networks and proposed to use conventional im-putation techniques to handle missing data. He performed a simulation study in which he tested data imputation methods. Huisman (2009) imputed values by replacing missing values with the mean (Imputing unconditional means), by sampling them from the distribution of values from alike respondents in the same or other data sets (Imputing from unconditional distributions), by using the associations in the observed data in the form of regression models to predict the missing data

(2)

2 JONATHAN KLAIBER

(Imputing conditional means) and by sampling missing val-ues from a conditional distribution of the missing data given the observed data (Imputing from conditional distributions). Huisman (2009) concluded that these conventional imputation methods do not work well in imputing missing data and he suggested to turn to more advanced methods to impute data for network models.

Di Zio, Scanu, Coppola, Luzi, and Ponti (2004) used a Bayesian network approach to impute missing values for di-rected unweighted graphs with a high consistency between im-puted and original data sets. Their idea is that the joint proba-bility distributions of subsets of variables can be estimated by means of Bayesian networks, and based on these distributions the missing values can be computed meaningfully. Di Zio et al. (2004)’s method of data imputation seems promising. However, a limitation of their study was that they only imputed data sets with 5% and 10% missing values. Good imputation results are expected in such a situation because a lot of infor-mation is available due to the large proportion of non-missing data. We will continue on this path and test data imputation methods for 10% and higher proportions of missing values.

This thesis will start with a conceptual introduction of the Ising model. Thereafter, we discuss different approaches to estimate an Ising model and problems related to the estima-tion. The first aim of this exploratory methodological study is to find the most suitable method to estimate an Ising model network given different network characteristics. The networks differ in (1) the number of nodes, (2) number of observations, and (3) the extent of connectivity. We test the estimation meth-ods in different network structures: networks that are tightly connected with many edges between nodes (dense network), somewhat connected (medium dense network) and hardly con-nected with few strong edges between nodes (sparse network). Some estimation methods perform better when there are many nodes or observations while other methods are better suited to estimate dense instead of sparse networks. Ising model estimation techniques can be divided into two main groups, (1) methods that use, and (2) methods that do not use regu-larization. Regularization means to introduce additional in-formation to limit the number of parameters that have to be estimated. It is expected that an estimation method that does not use regularization will obtain better results, because the parameters do not have to be constrained which means there is no distortion due to the constraints. However, an increas-ing number of nodes and a decreasincreas-ing number of observations means that more parameters have to be estimated with less information. Such a situation will probably necessitate an es-timation method with regularization. Although we expect this pattern to emerge, it does not make sense to formulate specific hypotheses with regard to specific estimation methods. There-fore this thesis will be purely exploratory.

The second aim of the present thesis is to find a missing data imputation technique that is suited to estimate an Ising model network based on incomplete data. The most suitable imputation technique is chosen with regard to different net-work characteristics (nodes, observations, connectivity) and different proportions of missing values in the data set.

The third aim is to apply an imputation technique to exam-ple data from an adaptive learning system. We illustrate the validity of our data imputation technique by applying it to a real world correct/incorrect answers data set from the adap-tive Math Garden system (Klinkenberg et al., 2011). Math Garden enables students to practice arithmetic skills in a

play-ful manner and teachers to monitor their progress. The adap-tive MG system is explained in more detail in Appendix A (Klinkenberg et al., 2011). The MG environment provides a rich source of longitudinal ability data of schoolchildren at dif-ferent moments in time1_{. In adaptive systems every subject is}

provided with items that fit his/her latent ability. The conse-quence of this system is that every subject answers different items. This is different to classical test theory, in which all subjects answer the same items. Consequently, unlike classi-cal test theory, data in adaptive systems systematiclassi-cally contain missing values. The paper will be completed by a general dis-cussion that summarizes all studies and provides suggestions for further research.

Study I: Estimating the Ising Model

In this thesis, all random variables will be represented by cap-ital letters and (particle) states will be denoted with lower case letters. Boldfaced capital letters represent matrices with re-gard to parameters and random vectors with rere-gard to random variables. A network consists of P binary variables that can take the value 1 (for example true or correct) or -1 (for ex-ample false or incorrect). The letter N denotes the number of observations on P variables. The subscript i stands for a specific random variables Xi, whereas the subscript j denotes

a different random variable Xj.

The Ising Model

The Ising model originates in statistical physics, and was de-veloped to explain magnetism (Ising, 1925). The Ising model, in a simplified version, consists of discrete variables called particles that can be encoded in two states, namely -1 and 1. The particles are arranged in a lattice and the combination of all observed particle states is called the configuration of the network. This arrangement allows a particle to be influenced by its neighboring particles. The behavior of single particles and their connection with the surrounding particles lead to synchronized behavior of the whole system.

The most important parameters in the Ising model are: The τ threshold parameter of a binary particle that determines how likely it is for a particle i to be in a certain state, where state xi can be -1 or 1. Further, there is the ω network parameter

that determines how likely it is for two particles i and j to be in the same state. The Ising model boils down to the following distribution, which specifies the probability to observe a given configuration of the network (Ising, 1925):

Pr(X= x) = 1 Zexp X i τixi+ X <i j> ωi jxixj ! . (1)

Equation 1 shows that a given configuration of the network has to be divided by all possible configurations to determine its likelihood. The normalizing constant Z (partition function) sums over all possible configurations of the network:

Z=X x exp X i τixi+ X <i j> ωi jxixj ! . (2)

In Ising model estimation with P variables, the P threshold parameters for all nodes are represented by a threshold vector τ. The network parameters that define the association (weight)

(3)

discussed, namely the eLasso method (Van Borkulo et al., 2014) and the Full-Data-Information method (Marsman et al., 2015).

The log-likelihood function for a given configuration x of the network can be obtained from equation 1 and 2:

L(τ,Ω; x) = ln Pr(X = x) =X i τixi+ X <i j> ωi jxixj− ln Z. (3)

The normalizing constant Z, the sum of probabilities of all possible configurations, has to be computed in order to get the log-likelihood of a network. The normalizing constant Z is often not tractable to compute because it requires to sum over all 2P _{possible configurations (for P} _{= 15 there are already}

32768 possible configurations) of the network which is only tractable for a very small number of variables (Epskamp et al., 2015; Marsman et al., 2015; Van Borkulo et al., 2014). When Pincreases, the number of possible configurations increases exponentially and thus the amount of nodes quickly reaches the point where there are too many possible configurations to compute with the log-likelihood function. Although maximiz-ing the log-likelihood function of a network results in a good estimation of all Ising model parameters, it is often not ap-plicable because the normalizing constant Z is intractable and will therefore not be considered in the simulation studies.

Log-linear Model Method. The Ising model can be rep-resented as a log-linear model with at most second-order in-teractions (Epskamp et al., 2015). When there are nonzero interactions between all variables, this model is called the homogenenous association model(Agresti, 2013). The log-linear model methodcan, for example, be based on the “Itera-tive Proportional Fitting” algorithm (Haberman, 1972). In this algorithm, the deviation between observed and fitted margins is reduced at every iteration, until this deviation is smaller than a fixed certain value. Estimates for the log-linear model can be obtained through this fitting procedure. The “EstimateIs-ingLL” function, implemented in the R package “IsingSam-pler” (Epskamp, 2015), is used to estimate an Ising model phrased as log-linear model.

Pseudolikelihood Method. The Ising model parameters can be estimated by performing multiple logistic regressions to obtain the conditional probability of a node i given all other nodes (Epskamp et al., 2015). The conditional probability for node i given all other nodes is equivalent to a multiple logistic regression as Epskamp et al. (2015) shows:

Li(τ,Ω; x) = xi τi+ X j ωi jxj ! −X xi exp xi τi+ X j ωi jxj !! . (4)

PL uses the conditional probability of a node given all other nodes and thus reduces to the sum over all possible configura-tions of a single node, namely -1 and 1, which is computably feasible. Estimation of the Ising model via pseudolikelihood is possible with the function “EstimateIsingPL” of the R package “IsingSampler” (Epskamp, 2015).

Univariate Logistic Regression Method. The disjoint pseudolikelihood is conceptually similar to the pseudolikeli-hood, but instead of the total PL, the conditional probabil-ity of every node i is optimized separately to obtain the best parameter estimates given the data (Liu & Ihler, 2012). One threshold parameter for every node i and two network param-eters for the connection between nodes i and j are obtained by performing multiple logistic regressions (Epskamp et al., 2015). The threshold parameter τiis the intercept of the

mul-tiple logistic regression with node i as the outcome variable. Two network parameters for one edge result from regressing node i on node j and by regression node j on node i. The final edge weight between node i and j is the mean of the two network parameters (Epskamp et al., 2015). The univariate logistic regression method applies the disjoint pseudolikeli-hood approach and can be executed with the “EstimateUni” function of the “IsingSampler” (Epskamp, 2015) R package.

Multinominal Logistic Regression Method. In the multi-nominal logistic regression method, each pair of nodes is re-gressed on all other nodes by multinominal logistic regres-sions. This regression method is an extension of the logistic regression method, because the nominal dependent variable has more than two levels. In a multinominal logistic regres-sion, the observation score (the score of all other nodes) can be translated into a probability value for a certain outcome of a pair of nodes (possible outcomes: [1,1; -1,1; 1,-1; -1,-1]). An alike probability for a pair of nodes cannot be obtained in a normal logistic regression model with a binary outcome variable. This principle of the multinominal logistic regression allows to combine the predictions of all node pairs on all other nodes, without the issue of error propagation. Error propaga-tion is the increase in estimapropaga-tion errors when combining mul-tiple models. The result of the multinominal logistic regres-sionapproach is one estimate for each network parameter as well as P variable estimates for each threshold parameter. The “EstimateIsingBi” function of the R package “IsingSampler” (Epskamp, 2015) estimates the Ising model by means of the multinominal logistic regression method.

One challenge of estimating an Ising model is that each pa-rameter requires a large number of observations. There are

2_{The weight matrix is symmetrical and therefore the upper (or}

lower) triangle of the weight matrix is sufficient to define the network structure.

(4)

4 JONATHAN KLAIBER

Pthreshold parameters to estimate for P variables (nodes) as well as P(P−1)₂ network parameters. In log-linear analysis, the rule of thumb is to have at least three times as many obser-vations as there are parameters to be estimated. For example, for as few as P = 10 variables, the sample size should con-tain at least N = 165 observations. Thus, even if the problem of the intractable normalizing constant Z is circumvented by choosing a (disjoint) pseudolikelihood approach, many data points are still necessary to estimate reliable parameters. With too little data, and accordingly too little information, the net-work model is unidentifiable and parameters might increase to infinity (Agresti, 2013). Obtaining a large amount of psycho-metric data is challenging, because causal relationships often involve small effects that are difficult to measure in isolation. That is measuring the effect while controlling for the effect of other confounding variables. Often only a limited amount of data can be gathered due to the complexity involved in psy-chological experiments. A solution to a limited amount of data when estimating network models is to apply regulariza-tion and impose constraints during the estimaregulariza-tion procedure. Another problem that can render the model unidentifiable is low variance on several variables as this also means that too little information is available and that regularization should be applied.

eLasso Method. A popular regularization method is the least absolute shrinkage and selection operator (LASSO; Tibshirani, 1996). Van Borkulo et al. (2014) introduced an Ising model estimation method that implements the LASSO regularization which is called eLasso and is implemented in the R package “IsingFit” (Borkulo, Epskamp, & with contri-butions from Alexander Robitzsch, 2014). While each node is regressed on all other nodes, LASSO makes sure that an `1-penalty is applied to the regression coefficients. The `1

-penalty causes irrelevant parameter estimates to shrink to zero. The overall effect is that less Ising model parameters have to be estimated because all “unimportant” parameters have been set to zero. The disadvantage of imposing an penalty parameter is that the shrinkage of the model is directly re-lated to the `1 penalty parameter. The strength of influence

of the `1 penalty parameter on the regression coefficients is

determined by a tuning parameter λ; the higher the tuning parameter λ, the more the regression coefficients shrink and the less parameters have to be estimated. The underlying data are not well modelled when λ is not chosen carefully which subsequently means that the network model will be es-timated poorly. Multiple models are generated with a maxi-mum of 100 different penalty parameters that are chosen per node to prevent poor estimation caused by the λ parameter. The best fitting model among all models that are generated with these different penalty parameters is selected based on the EBIC (extended Bayesian Information Criterion; Chen & Chen, 2008). The EBIC is an extension of the BIC (Bayesian Information Criterion) for model selection from many di ffer-ent models (a large model space) that takes into account the number of unknown parameters and the complexity of the model space. The EBIC is computed with a γ hyperparam-eter that dhyperparam-etermines how large the penalty on the number of edges per node should be. This hyperparameter γ is imposed to find the most parsimonious model that explains the under-lying data sufficiently while having the least possible num-ber of connections between nodes. Van Borkulo et al. (2014) suggest that the EBIC model selection performs best when γ is set to 0.25. In the Ising model estimation simulation, we

tested the eLasso method with the default γ hyperparameter of 0.25 (eLasso0.25) as well as with a γ hyperparameter of zero (eLasso0). A γ hyperparameter of zero means that there is no penalty imposed on the number of connections per node, con-sequently the number of connections per node is higher than when the γ hyperparameter is not zero. We would like to note that a γ hyperparameter of zero does not mean that no regu-larization is applied. A γ hyperparameter of zero means that sparse networks are not favoured over dense networks in the EBIC model selection process. The eLasso method obtains the best results when the network is sparse with many zero edges (Van Borkulo et al., 2014).

Full-Data-Information Method. An Ising model estima-tion approach that uses regularizaestima-tion in a different manner is the Full-Data-Information (FDI) approach of Marsman et al. (2015). Both, Marsman et al. (2015) and Epskamp et al. (2015) show that the Ising model is equivalent to a 2-parameter multidimensional Item Response Theory model (MIRT; Reckase, 2009) with a posterior Gaussian distribution on the latent variable(s). In psychometrics, the (multidimen-sional) IRT model is a paradigm for measuring latent abili-ties for example by a test with items of different difficulty. The benefit of the equivalence between the Ising model and the MIRT model is that the conditional distribution of the ob-served variables conditioned on the latent variable(s), as in a MIRT model, does not depend on the normalizing constant Z and is thus feasible to compute. Furthermore, Marsman et al. (2015) and Epskamp et al. (2015) show that the latent variable(s) in a MIRT model directly correspond to the low-rank(s) of the weight matrix in the Ising model. Large eigen-value in a dense Ising model network have a higher predic-tive value than small eigenvalues which shows that the low-rank matrix approximation is valid according to Marsman et al. (2015). In the FDI method, the number of low-ranks of the weight matrix has to be specified before the estimation starts based on theoretical reasons. The Ising model parame-ters are simulated with an iterative Bayesian process involving a Gibbs sampler (Geman & Geman, 1984). The regularization that is used in the FDI approach is the additional information that is provided by specifying the number of low-ranks. Thus, through determining a specific number of latent variables that explain the data sufficiently, it is possible to find model param-eters without the problem of overfitting. For the simulation study, the FDI method was specified with a two and a four rank version (FDI2 and FDI4, respectively) to test the influ-ence of a certain number of low-ranks without any theoretical reason.

Methods

Underlying Network Model

The “underlying”, or “true” network was an Ising model that we created to sample experimental data. The data consisted of binary responses for every subject on all variables. The simu-lated data set was used to estimate an Ising model. We mea-sured the performance of an Ising model estimation method by comparing the underlying network that generated the data with the network model that was estimated based on these data; the higher the accordance between underlying and estimated net-work, the better the performance of the estimation method.

In the simulation studies, we varied the extent to which the underlying network was connected with three levels of “noise”. Our aim was to simulate underlying networks that

(5)

Figure 1. Example of a Ising network models with P = 20 nodes and noise = 0 (a), noise = 0.1 (b) and noise = 0.2 (c). Weak edges are not shown in this Figure.

were fully connected and that had an overarching structure, because such networks are plausible in psychological re-search. In an overarching structure, significant variables ex-plain most variance, but it does not necessarily mean that it has a low-rank structure. There were three levels of noise: loosely connected sparse networks (noise= 0), medium con-nected networks (noise = 0.1) and highly connected dense networks (noise= 0.2). Figure 1 depicts three networks that were created with the three levels of noise. The loosely con-nected network (noise= 0) was not a sparse network in the sense that it did not have a smaller number of connections be-tween nodes than the maximum possible number of connec-tions. There are very small edges between nodes in the sparse network in Figure 1 a), which are not depicted, thus also the loosely connected sparse network had no zero edges. Nev-ertheless, throughout this thesis we will refer to the loosely connected network with noise = 0 as a sparse network, be-cause it only had a few strong connections. The underlying network parameters were simulated with the Barabasi game algorithm(Barabási & Albert, 1999) which is provided by the R “igraph” package (Csardi & Nepusz, 2006). First, this al-gorithm created a weight matrix that contained plausible net-work parameters and some zeros. Second, all zeros were re-placed by the noise value. Third, random values were added to the upper triangle of the network parameter matrix that were sampled from a normal distribution with a mean of zero and a standard deviation of 0.05. Fourth, the network parameter matrix was made symmetric by copying the values from the upper triangle to the lower triangle of the matrix and the ma-trix diagonal was set to zero. For the underlying threshold vector, threshold values were sampled from a uniform distri-bution with a lower bound that was defined by dividing the number of nodes by four and multiplying the resulting value by -1. The upper bound was always zero (for example when the number of nodes was P= 20, the lower bound of the uni-form distribution was -5 and the upper bound was zero). The threshold parameter values were chosen to be slightly negative because we often model nodes that represent psychopatholog-ical symptoms for which it make sense to be rather absent than present. The obtained true network and threshold parameters had been selected to be similar to network model parameters that are often encountered in psychological research.

Sampling Response Data

The response data, that were used to estimate the Ising models, were sampled by means of the function “IsingSampler” pro-vided by the R package of the same name (Epskamp, 2015). The “IsingSampler” function samples response values for a specified number of observations based on the underlying net-work and threshold parameters. The “Metropolis-Hastings” algorithm chain is initialized with a random response from the binary response options (0 and 13; Hastings, 1970). Subse-quently, for each iteration and for each node, a response is set to the other response option with the probability of the asso-ciated node taking the other response option given all other nodes and parameters (Epskamp, 2015). We specified the “IsingSampler” to do 1000

#nodes iterations until the most

proba-ble response data, given the underlying network and threshold parameters, were generated.

Conditions

The simulation varied in number of nodes, number of obser-vations, levels of noise and estimation methods. The networks that were used had 5, 7, 10, 15 and 20 nodes. Four different numbers of observations were applied, namely 50, 100, 250 and 500 observations. The number of nodes and observations were chosen to represent the numbers of variables and sub-jects in psychological experiments. Eight different estimation methods were applied: Log-linear model (ll method), pseudo-likelihoodapproach (pl method), univariate logistic regression (uni method), multinominal logistic regression (bi method), eLasso0, eLasso0.25, FDI2 and FDI4. Combining all levels of all conditions with each other led to a total of 480 combi-nations. We simulated each of these combinations 100 times. In total, 48000 simulations were performed for this simulation study.

Results

Network Parameters

Figure 2 depicts the correlation between the weight matrices of the true and the estimated networks. Aggregated over con-ditions, all methods perform similarly well with correlations

3_{In the simulation study we used the binary response options 0}

and 1. This is only a difference in notation compared to the introduc-tion secintroduc-tion in which the particle states (response opintroduc-tions) -1 and 1 were used to explain the Ising model.

(6)

6 JONATHAN KLAIBER

Figure 2. Network parameter correlations as outcome measurement of the Ising model estimation methods. Correlations between the weight matrices of the underlying and the estimated network models were computed. The figure depicts the conditions: P number of nodes (at the top), sample size (at the bottom), noise as connectivity of the network (at the right) and estimation methods as coloured lines. The grey number in the top left corner corresponds to the panel number.

in the range between 0.478 and 0.697 except FDI4 with a low correlation of r = 0.305 (SD: 0.227) and the bi method with r= 0.279 (SD: 0.34)4. These two methods are clearly worse than all other methods and will not be taken into ac-count when general patterns are discussed. Figure 2 shows some general patterns: The higher the sample size, the higher the correlation between true and estimated network param-eters. Moreover, when sample size is small, there is more variance between estimation methods (for ≤ 10 nodes, the SD for 50 subjects is 0.234 whereas it is 0.125 for 500 sub-jects) than when the sample size is large. Exactly the op-posite applies for larger node sizes, for > 10 nodes, the SD for 50 subjects is 0.123 whereas it is 0.218 for 500 subjects. These findings are not surprising because when sample size is small and there are only a few nodes, few parameters have to estimated, thus especially the pl and the ll method obtain good estimates. The good estimates are probably obtained because these methods do not apply constraints during esti-mation. When node size is high, most estimation methods perform poorly when the amount of subjects is small (for 50 subjects and 20 nodes, mean correlation is 0.166, SD: 0.123). However, when there are many subjects, the variance between estimation methods increases (for 500 subjects and 20 nodes, r= 517, SD: 0.218), because some methods can handle net-work estimation with many nodes while others cannot. The methods suitable for network estimation with many nodes are predominately eLasso0 and eLasso0.25, since they apply reg-ularization. In applying regularization they limit the amount

of parameters that have to be estimated to obtain better results. The eLasso methods perform well in networks when connec-tivity is low (noise is 0, Figure 2, panel 1 to 5) and reasonably well when the network is dense. When node and sample size are high, estimation with the FDI2 method returns reasonable results (for 20 nodes and 500 subjects, r= 0.419, SD: 0.145) for every level of noise. The type of regularization that the FDI methodapplies does return good results in networks with many nodes, but not as good results as the eLasso method. The logistic regression methods perform not as good in high node and high sample size networks. The quality of the estimations suffer when the number of parameters is high and without ap-plying any form of regularization.

Thresholds

Figure 3 displays the correlation between the threshold vec-tors of the true and the estimated model. In general, threshold estimation is reliable with correlations of 0.5 and higher ag-gregated over all conditions. When the underlying network is sparse (noise is 0, Figure 3, panel 1 to 5), two general pat-terns are visible, one for node size larger than 10 and one for node size equal or smaller than 10. When node size is equal or smaller than 10, all methods except the two FDI methods improve in terms of threshold correlation when moving from a small sample size to a large sample size (from 0.474, SD:

4_{The supplementary Appendix Tables lists the means and}

(7)

Figure 3. Threshold correlations as outcome measurement of the Ising model estimation methods. Correlations between the threshold vectors of the true and the estimated network models were computed. The figure depicts the conditions: P number of nodes (at the top), sample size (at the bottom), noise as connectivity of the network (at the right) and estimation methods as coloured lines. The grey number in the top left corner corresponds to the panel number.

0.358 for 50 observations to 0.774, SD: 0.234 for 500 obser-vations). The correlations of the FDI methods stagnate when moving from a small to a large sample size, suggesting that, these methods cannot benefit from the additional information of an increased sample size to improve the model fit. As Marsman et al. (2015) state in their article, the FDI methods perform best when networks are dense and when they have a low-rank structure. It is expected that the FDI methods do not obtain good results when estimating a network with only a few strong connections and without a low-rank structure. In con-trast, when the underlying network is dense as in the third row of Figure 3 (noise is 0.2), the FDI methods perform similar to the other estimation methods.

The second general pattern of threshold correlations, when node size is larger than 10, shows more differences between methods. As it was the case with the network parameter esti-mations, the logistic regression methods are poor at estimating the thresholds due to the high number of parameters. Also the log-linear model method struggles with a high number of parameters as described in the Failed Estimations section . For sparse and medium dense networks (noise is 0 or 0.1), the eLasso methods perform supremely well (for 20 nodes and noise 0, the mean correlation for the eLasso methods is 0.895, SD: 0.142, for noise 0.1 it is 0.896, SD: 0.125).

Eigenvalues

Figure 4 depicts eigenvalue correlations between the eigenval-ues of the true and the estimated networks. In general, the

eigenvalue correlations are very high. Nevertheless, there are two tendencies worth noting. First, aggregated over all con-ditions, the FDI methods seem to be the poorest in estimating the eigenvalue structure of a network. Second, the differences between estimation methods is highest in loosely connected networks (noise is 0 and 0.1) with 15 and 20 nodes (Figure 4, panel 4, 5, 9 and 10). The cause of this tendency is probably the combination of a sparse network paired with a high num-ber of nodes. A sparse network with many nodes has too little information for some methods to reliably restore the eigen-value structure.

Failed Estimations

Figure 5 depicts the proportion of failed network estimations. A network estimation fails when the resulting threshold vec-tor and weight matrix contain undefined values (NaN values). The reason for an estimation failure have to be assessed per method. For example, failure of the log-linear method might be caused by zero values for the expected frequencies, a vio-lation of a log-linear analysis assumption. The proportion of failed estimations is interesting with regard to the log-linear methodand the multinominal logistic regression method. The more nodes, the more parameters have to be estimated and the higher the proportion of failed estimations. The proportion of failed estimations for the ll method is 0.995 with 20 nodes in a sparse network (Figure 5, panel 5). In a dense network with a large sample size the proportion slightly decreases, because more information is available. The proportion of failed

(8)

esti-8 JONATHAN KLAIBER

Figure 4. Eigenvalue correlations as outcome measurement of the Ising model estimation methods. Correlations between the eigenvalues of the true and the estimated network models were computed. The figure depicts the conditions: P number of nodes (at the top), sample size (at the bottom), noise as connectivity of the network (at the right) and estimation methods as coloured lines. The grey number in the top left corner corresponds to the panel number.

mations for the bi method is related to the connectivity of the network. Only when noise is 0.2 (Figure 5, panel 15), there is a significant amount of 48.5% failed estimations.

Discussion

All tested Ising model estimation methods perform well for networks with 10 or less nodes except the multinominal logis-tic regressionand the FDI4 method. In the case of an Ising network of more than 10 nodes, different methods have their strength and weaknesses. In general, very good results were obtained with the log-linear model method. The ll method out-performs the other methods under the condition that there is enough information available for the network estimation pro-cess to not fail. Consequently, the ll method is better when the underlying network is dense instead of sparse because dense networks comprise more variance and thus more information. In contrast, the probability that the ll method fails is higher when there are many nodes, when sample size is small or the underlying network is loosely connected. For a successful net-work estimation with the ll method, the sample should contain at least 250 observations for a 10 node network and at least 500 observations for a 20 node network when the underlying structure is dense to have at least a probability of estimation success of 50% and 22%, respectively (Figure 5, panel 14 and 15).

When the failure of an Ising model estimation is not an op-tion, the log-linear method is not the best choice as Figure 5 clearly shows because this method fails frequently in networks

with many nodes. The second best option is to use the eLasso method(Van Borkulo et al., 2014). Whether a hyperparameter γ of 0 or 0.25 is chosen does not influence the results much, al-though the γ= 0 version performs slightly better when the net-work is sparse. The results are unexpected because the eLasso method should work better when the underlying network is sparse rather than dense because the LASSO regularization is only effectively applicable in sparse networks.

The pseudolikelihood and the univariate logistic regression method concluded with intermediate results. The gap in re-sults between the eLasso and FDI2 method and the pl and uni methodmight be due to the application of regularization. Whereas the former named methods apply regularization and can thus cope with a large number of parameters, the pl and uni methoddo not apply regularization which seems to worsen the fit of their model estimation. Only up to 10 nodes, the pl and the uni method reliably estimate the network parameters. Regularization seems to distort the network estimation some-what for small networks with a few nodes. In these network, all logistic regression methods expect the bi method perform slightly better than the regularization methods.

The FDI4 and the multinominal logistic regression method are not suited to estimate Ising model networks with the network characteristics that were tested in this simulation study. The most plausible reason for the bad performance of the FDI4 method is that the underlying network did not have a low-rank structure. The FDI2 method which assumes two dominant latent factors could obtain reasonable results,

(9)

Figure 5. Proportion of failed Ising model estimations. The figure depicts the conditions: P number of nodes (at the top), sample size (at the bottom), noise as connectivity of the network (at the right) and estimation methods as coloured lines. The grey number in the top left corner corresponds to the panel number.

whereas assuming four latent factors as the FDI4 method caused a lot of distortions. The stagnating lines of the FDI4 method in Figure 2 indicate that an increase in sample size and thus an increase in information does not lead to better re-sults. The conclusion with regard to the FDI methods is that a high number of low-ranks should only be used when there is a strong theoretical reason to assume an underlying network structure with many low-ranks or in the case that preliminary analysis clearly suggest a high number of latent factors that explain the data. The FDI method with two ranks (FDI2) obtains satisfactory results and can be applied even if there is no underlying low-rank structure. While the bi method is not worse than the other methods in five node networks, it is much worse with more nodes. Interestingly, when node size is 20 and many parameters have to be estimated, the bi method often still obtains results even though the estimations have no relation with the true networks (correlations are almost zero), whereas the log-linear method completely fails in most cases. This makes the ll method reliable because we know that the obtained results are meaningful if the network estimation does not fail.

The results of the threshold estimation are very similar to the network parameter estimation. The eLasso method out-performs all other methods in all conditions except when the underlying network is dense (Figure 3, panel 11 to 15). With regard to the threshold parameter estimation, the results are in line with Van Borkulo et al. (2014)’s suggestion, namely that the eLasso method with a hyperparameter γ= 0.25 performs slightly better than with γ= 0. Noticeably, the increased

num-ber of parameters when there are many nodes has a greater negative impact for the multinominal and univariate logis-tic regression methodwhen estimating threshold compared to network parameters.

Study II: Missing Data Imputation

Missing data are a frequent problem in scientific, and espe-cially in psychological, research and has led to numerous at-tempts to resolve this problem and misconceptions related to it (Schafer & Graham, 2002). A distinction between di ffer-ent causes and types of missing data is helpful to handle the missing data problem. Schafer and Graham (2002) name three different causes of missingness: It is called a unit nonresponse when the data collection procedure failed completely, for ex-ample an online survey is not displayed correctly on a subject’s computer and the subject cannot take the survey. Missing data on particular variables are termed item nonresponse, for exam-ple when a subject refuses to answer an item about drug use. When the cause of missing data is a wave nonresponse, sub-jects missed measurement occasions in a longitudinal design, for example in the event that a subject dies before data collec-tion is completed. Only in the case of an item nonresponse or a wave nonresponse it makes sense to impute data because in the unit nonresponse case no data of the subject is available and consequently no information to impute missing values.

Next to the cause of missingness, it is important to deter-mine whether data are missing systematically or at random (Rubin, 1976). Rubin (1976) suggests three types of missing data: missing completely at random (MCAR), missing at

(10)

ran-10 JONATHAN KLAIBER

Figure 6. Concept graph of the iterative imputation procedure with steps a) to d).

dom(MAR) and missing not at random (MNAR). He states that MCAR means that the probability of missingness is nei-ther related to the observed data nor to the (unobserved) miss-ing data and hence it is completely random. In the MAR case, the probability of missingness is related to the observed data but not to the missing data, for example when the probabil-ity of not answering an item in a survey is higher because of item answers that were given before. Missing data are labelled MNAR when the probability of missingness is related to the missing data, for example when the probability of not answer-ing an item is higher because other items have not been an-swered before. In the two cases that the data are missing (com-pletely) at random, no systematic bias is present or at least it does not influence missing data imputation. When data are not missing at random, there is a systematic bias that will distort the analysis because a systematic difference exists between subjects with and without missing data (Huisman, 2009). To conclude, missing data imputation is valid only in the case that the cause of missing data is an item nonresponse or a wave nonresponseand when the type of missing data is MCAR or MAR.

In the data imputation study we explain the imputation pro-cedure first. Thereafter, supplementary repair mechanisms re-lated to the imputation procedure are discussed. For some Ising model estimation methods it is necessary to apply a repair mechanism to prevent errors during the iterative im-putation procedure. The repair mechanism that will be dis-cussed are “controlling for threshold differences” (ThresDiff ) and “metropolis step” (mStep). The simulation results with regard to the repair mechanism, the network and threshold pa-rameters and the duration of the imputation are reviewed in the result section and discussion remarks, implication and

in-tegration of the findings in the broader context takes place in the discussion section.

Imputation Procedure

The basic idea of the imputation procedure is to iteratively re-place the missing values with values that are the most plausible given the (none-missing) observed data. The most plausible values are imputed by means of information from the underly-ing Isunderly-ing network model that is repeatedly estimated based on the observed data and the newly imputed data.

The procedure starts by creating a weight matrix and a threshold vector that contain only zeros which means that there are no connections between nodes, and that there is no tendency for a node to be in a particular state. This situation is depicted in Figure 6 at step a). The missing values in the data matrix, in which the columns represent the nodes and the rows the observations, are imputed based on the Ising model parameters that contain only zeros by means of the R package “IsingSampler” (Epskamp, 2015). These first imputed values do not contain any information about the observed data be-cause they are sampled according to an “empty” Ising network model. Step b) in Figure 6 displays the drawing of samples based on the Ising model to fill in the missing values. In the third step c), the Ising network model is estimated based on all data, the observed data and the missing values that were just sampled. From the Ising model estimation, we obtain an “up-dated” weight matrix and threshold vector based on the data. In the fourth step d), a repair mechanism can be applied. This mechanism is only applicable and necessary for Ising model estimation techniques that fail and lead to imputed values of poor quality when the mechanism is not applied. After the

(11)

logistic regression method. Preliminary simulations showed that during the imputation process it can happen that thresh-old values strongly increase while values in the weight matrix strongly decrease, or the opposite way around. The cause of parameters to increase or decrease towards infinity is that they are not identified due to a too strongly connected network. When the network gets two strongly connected during the iter-ative process, the variance of the sampled data decreases and less information is available to estimate the network model. With too little information, some parameters are not longer identifiable and the Ising model estimation fails. Two repair mechanism are discussed to prevent this problem from hap-pening. The ThresDiff mechanism stops the imputation pro-cess when unreasonable threshold values are obtained and re-turns to a previous iteration. The mStep algorithm (Metropolis, Rosenbluth, Rosenbluth, Teller, & Teller, 1953) computes the pseudolikelihood and accepts new imputed values only when they are reasonable.

Controlling for Threshold Outliers (ThresDiff)

The principle of the ThresDiff mechanism is to monitor the change in threshold values at every iteration to detect and pre-vent failure of the imputation process. The Ising model thresh-old values are saved per iteration and from the beginning of the second iteration on, the difference between the previous and the current threshold vectors is computed. An interval of acceptable values is determined by adding and subtracting six standard deviations from the average difference value. At every iteration, this range of acceptable values is updated and it is checked whether the current threshold difference values are all within the range of acceptable threshold values. The wide range of six standard deviations was chosen because even without a network estimation failure in sight, the threshold val-ues can change considerably. When the threshold difference check fails because a threshold value is not in the range of ac-ceptable values, the imputation procedure is stopped. In such a situation the repair mechanism saves the number of iterations, the weight matrix and the threshold vector of the second last iteration before the procedure failed. The mechanism returns to the second last iteration because then the parameters were still meaningful. The whole imputation procedure is repeated four times. After the fourth time, the final model is computed by averaging the four weight matrices and threshold vectors of the four imputation procedures and weight them by the num-ber of iterations they have run through. The missing values are sampled from the final model. For example, when there is no failure at three imputation procedures but the fourth pro-cedure fails at iteration 5 of 25, then propro-cedure one to three get a weight of 0.31 while the last imputation procedure is

cepted makes it possible to find solutions that could not have been found if only better solutions would have been accepted. In mStep, the pseudolikelihood of the Ising model given the observed and imputed data is computed and saved at the first iteration of the imputation procedure. In all following iter-ations, the pseudolikelihood is computed based on the Ising model and the newly imputed data. Moreover, the proposed pseudolikelihood (PLpro) of the present data is compared to

the current pseudolikelihood (PLcur) of the previous iteration.

The probability of accepting the present data is: p(accept)= e(PLpro−PLcur)

This probability is always greater or equal to one when the proposed pseudolikelihood is equal or higher than the current pseudolikelihood. The higher the current compared to the pro-posed pseudolikelihood, the smaller the probability that the newly imputed data will be accepted as an improvement. The imputed data at the end of the imputation process have the highest pseudolikelihood of all Ising models generated during the iterative imputation process based on the observed and im-puted data.

Methods Conditions

The same underlying network structure as in the method sim-ulation was applied in the imputation simsim-ulation. Networks with 10, 15 and 20 nodes were tested. The sample size condi-tion had three levels, 250, 500 and 1000 observacondi-tions. Miss-ing values were generated completely at random. The miss-ing value condition had three levels: 10%, 25% and 50 % missing values. Furthermore, there were four different Ising model estimation methods in the imputation method condi-tion: univariate logistic regression (uni method), eLasso with the default γ hyperparameter of 0.25 (eLasso0.25) and the Full-Data-Information method with a two and a three low-rank version (FDI2 and FDI3, respectively). These methods were selected based on the results of the estimation method simulation in study I. Although we would have preferred to include the well performing log-linear method, it was not pos-sible because estimation fails frequently for a network with 10 or more nodes. The uni method was chosen to be in-cluded with 25 imputation iterations to see whether a logis-tic regression approach can cope with missing data. Logis-tic regression approaches also struggle when many

parame-5_{In most metropolis step algorithms, the probability is also related}

to a slowly decreasing temperature. When temperature is high, a bad solution is more likely to be accepted than at the end of the process when the temperature is low.

(12)

12 JONATHAN KLAIBER

Figure 7. Performance of repair mechanisms applied to missing value imputation with the univariate logistic regression method on estimating Ising network models. ThresDiff, mStep and no repair mechanism are compared on the network parameter cor-relations between parameters that have been estimated with and without missing data. The colored lines represent the network parameter estimation with missing data and the black star without missing data (optimal correlation). The grey number at the right bottom corner corresponds to the panel number.

ters have to be estimated but failure of these methods can be avoided by applying repair mechanisms. Furthermore, the eLasso0.25, the FDI2 and the FDI3 method were included to test the imputation performance of the two different regular-ization approaches. The two and three low-rank FDI meth-odswith four imputation iterations were both included to take different numbers of low-ranks into account because the esti-mation method simulation revealed that the number of ranks can have a profound influence on the imputation results. The multinominal logistic regressionand the FDI4 method were excluded because of their poor performance in the simula-tion study. There were only slightly differences between the eLasso0and eLasso0.25 methods, therefore we chose to in-clude the eLasso0.25 method with 15 imputation iterations as suggested by the authors (Van Borkulo et al., 2014). The pseu-dolikelihood methodwas not taken into account because of the extensive time that this method needed to estimate an Ising network model with the current implementation6_{. It would}

not have been possible to obtain a sufficient number of simu-lations for the pl method in a feasible amount of time. Further imputation simulations should include this promising method. The total number of combinations in the imputation sim-ulation was 324 with four different imputation methods and three levels of noise, nodes, sample size and proportion miss-ing data. Every combination was simulated 100 times which amounted to 32400 simulations in total. In addition, imputa-tion with the uni method was tested with two different repair

mechanisms and no repair mechanism, which totaled to 16200 additional simulations.

Performance Measurement

The performance was measured by computing the correlation between the underlying parameters and the estimated parame-ters without missing data (same performance measurement as in study I). This correlation was termed the “optimal” correla-tion. The “imputed” correlation was the association between the parameters of the underlying network and the parameters of the estimated network with missing values. The perfor-mance was measured by the difference between the “optimal” and the “imputed” correlation, the smaller the difference be-tween “optimal” and “imputed” correlation, the better the per-formance. This performance measurement allowed us to study to what extent the presence of missing values influences the estimation of the Ising model.

Results

Repair Mechanism

Figure 7 shows the comparison between the ThresDiff, the mStepand no repair mechanism on estimating Ising network

6_{The long duration of a pl method estimation is not related to}

the computation of the pseudolikelihood but to the actual technical implementation of this estimation method in R (R Core Team, 2014).

(13)

Figure 8. Performance of imputation methods, eLasso0.25, FDI2, FDI3 and uni method on network parameter correlations between parameters that have been estimated with 10% missing values and with complete data. The colored lines represent the network parameter correlation with missing values and the colored stars represent the optimal network parameter correlation (without missing values). The grey number at the right bottom corner corresponds to the panel number.

models with the uni method. In all 9 panels of Figure 7, the black star represents the optimal correlation and the bottom group of lines represent the 10% (solid line), the middle group the 25% (dotted line) and the top group the 50% (dashed line) missing values scenario. Overall, the differences in perfor-mance between the repair mechanisms are small; the closer the colored lines are to the black star, the smaller the di ffer-ence between optimal and imputed correlation, and the better the results. For P = 10 nodes, all repair mechanisms per-form better with an increasing sample size because all lines are approaching the optimal correlation star, a line touching the optimal correlation star means that there is no difference between the optimal and the imputed correlation. For example for P= 10 and 50% missing values (Figure 7, panel 1, 4 and 7), the mean correlation difference is 0.41 (SD: 0.17) for 250 and 0.297 (SD: 0.124) for 1000 observations. There is almost no difference between repair mechanisms for the 10 nodes sce-nario due to the small differences in means and relatively large standard deviations. Panel 2 and 5 in Figure 7 indicate that the red coloured ThresDiff line is closer to the optimal correla-tion star than the other lines for the 50% missing values and P = 15 nodes case. Figure 7 shows that in a network of 20 nodes and with 50% missing values, the ThresDiff correlation differences are smaller than the differences of the mStep and when no repair mechanism is applied. When the underlying network is sparse (Figure 7, panel 3), the mean difference of ThresDiff is 0.077 (SD: 0.104) and of the two other methods it is 0.137 (SD: 0.115). A similar pattern is found for medium

dense and dense networks. Although we have to conclude that there are only slight differences in imputation performance be-tween the repair mechanisms in networks with few variables, we chose ThresDiff as the best working approach because it outperformed the mStep and no repair mechanism in P = 15 and P = 20 networks. Based on these results, we decided to include the univariate logistic regression method with the ThresDiff repair mechanism in the main imputation method analysis.

Imputation: Network Parameters

Figure 8 shows the optimal (colored star) and imputed network parameter correlations with a share of 10% missing values, Figure 9 with 25% missing values and Figure 10 with 50% missing values. Contrary to the method estimation simulation, a higher sample size does not lead to a better imputation result in most cases because the distance between the lines and the associated stars does not change considerably between differ-ent sample sizes. There are only small differences between im-putation methods in the 10% missing value scenario because the effect of missing values on the network estimation overall is limited. In general, the differences in correlation lie in the range between 0.004 and 0.141 with a mean difference of 0.07 (SD: 0.027). Panel 3, 4 and 6 of Figure 8 suggest that the eLasso0.25does not perform as well as the other methods, but standard deviation are too large to conclude that the result is significantly worse.

(14)

14 JONATHAN KLAIBER

Figure 9. Performance of imputation methods, eLasso0.25, FDI2, FDI3 and uni method on network parameter correlations between parameters that have been estimated with 25% missing values and with complete data. The colored lines represent the network parameter correlation with missing values and the colored stars represent the optimal network parameter correlation (without missing). The grey number at the right bottom corner corresponds to the panel number.

The picture looks similar for the 25% missing value sce-nario (Figure 9) with an average correlation difference of 0.182 (SD: 0.065). When noise is 0 or 0.1, the eLasso0.25 method performs worse than the other two methods. This find-ing is strikfind-ing with regard to the fact that eLasso0.25 should be better suited to impute missing values in sparse compared to dense networks. The results show the opposite, when net-works are dense (Figure 9, panel 7 to 9), the eLasso0.25 methodobtains equally good results as the other methods and only in the P= 20 and noise 0 or 0.1 scenarios, the results are worse with a mean of 0.21 (SD: 0.128, aggregated over noise level 0 and 0.1). The uni method on the other hand performs well in 20 node networks with a mean difference of 0.064 (SD: 0.05), especially when the network is sparse or medium dense. The results for the 20 node networks have to be interpreted with regard to the high variance in optimal correlations be-tween the imputation methods.

In the 50% missing value scenario, the uni method obtains better results than all the other methods in every condition. The greatest differences emerge for 20 nodes and a sparse net-work (Figure 10, panel 3) in which the average difference for the uni method is 0.077 (SD: 0.03) and for the second best FDI3 method 0.237 (SD: 0.069). The bad performance of the eLasso0.25 method in the 10% and 25% missing value case is confirmed as clearly visible in Figure 10, although this method is not worse than the FDI2 and FDI3 method in dense networks with less than 20 nodes (Figure 10, panel 7 and 8). Overall, the FDI methods obtain average results in all

con-ditions in between the eLasso0.25 and the uni method. It is noteworthy that the different number of low-ranks did not in-fluence the imputation results much.

Threshold Parameters& Duration

Figure 11 shows the difference between optimal and imputed threshold parameters averaged over all proportion of missing data levels. In general, the difference in threshold parame-ters is much smaller than in network parameparame-ters because all imputation methods are close to their optimal correlation star and the mean difference for all methods over all conditions is exactly 0 (SD: 0.037). Based on these results, we can con-clude that the quality of the imputed values has a much greater influence on the network parameters than on the threshold pa-rameters.

We measured how long the methods took to complete an imputation process with R version 3.1.0 (R Core Team, 2014). Two computers were used for the imputation process: A Lenovo G710 with a 2.2 GHz Intel Core i7 processor (four cores) and a Lenovo ThinkPad T440 with a 1.9 GHz Intel Core i5 processor (four cores). Averaged over all conditions, the eLasso0.25was the fastest imputation method with a mean duration of 10.543 seconds (SD: 7.001) followed by the uni methodwith 14.27 seconds (SD: 12.13) when ThresDiff was applied. Naturally, the uni method should have taken longer because a repair mechanism was applied which added extra time to the process. When no repair or the mStep mechanism was applied, the duration of the uni method was less than half

(15)

Figure 10. Performance of imputation methods, eLasso0.25, FDI2, FDI3 and uni method on network parameter correlations between parameters that have been estimated with 50% missing values and with complete data. The colored lines represent the network parameter correlation with missing values and the colored stars represent the optimal network parameter correlation (without missing values). The grey number at the right bottom corner corresponds to the panel number.

as long as with the ThresDiff mechanism. Finally, the FDI2 and FDI3 methods took the longest in average with 25.9 sec-onds (SD: 10.604) and 36.793 secsec-onds (SD: 25.759), respec-tively. The proportion of missing values, the network connec-tivity and the number of nodes had a diminishing influence on duration. Only an increase in sample size led to a considerable increase in imputation duration.

Discussion

The missing data imputation simulation suggests that the ap-plication of the ThresDiff repair mechanism leads to the best results. The ThresDiff repair mechanism makes it possible to detect and resolve overfitting during the imputation process. The results (Figure 7) show that this mechanism causes par-ticularly good estimations in the case of a 15 and 20 node net-work, when many parameters have to estimated. The applica-tion of a good repair mechanism has contributed to the good results of the univariate logistic regression method. While there is not much difference between the tested imputation methods in the 10% missing value case, there are profound differences when more data are missing. In the 50% missing value case the conclusion is clear: The univariate logistic re-gression methodis better than the other methods in imputing missing values and estimating an Ising model with network parameters that are very similar to the network parameters that would have been obtained without any missing data. The im-putation simulation also led to unexpected results with regard to the eLasso method: why did this method perform the best

when estimating a network but the worst when imputing miss-ing data? There are several possible answers to this question.

Firstly, the LASSO regularization could be a cause. LASSO forces small regression coefficients to shrink to zero and thereby it changes the structure of the networks which is rep-resented by the network parameters. This is a plausible cause for the eLasso performance although we should consequently find that the imputation performance is better in sparse than in dense networks which is not the case.

Secondly, a cause could lie in the comparison between the optimal correlation without and the imputed correlation with missing values. It could be argued that when the optimal cor-relation is high, it is more difficult to obtain an imputed corre-lation that is equally high than when the optimal correcorre-lation is low. The optimal correlation of the eLasso estimation method with regard to network parameters is among the highest as the results of study I showed. The colored stars show that the op-timal correlation of the eLasso method is considerably higher than the optimal correlation of the uni method, especially in P= 20 nodes networks. This can partly explain the great dif-ferences in imputation results, especially in the 50% missing values scenario. Future Ising model imputation studies should find a way to control for the effect of the optimal correlation on the imputation performance. We cannot exclude the dif-ference in optimal correlations as a cause of the difference in imputation results until we are able to control for this effect.

Thirdly, a cause could be the differences in number of im-putation iterations between the methods, the more iterations

(16)

16 JONATHAN KLAIBER

Figure 11. Performance of imputation methods, eLasso0.25, FDI2, FDI3 and uni method on threshold parameter correlations between parameters that have been estimated with 10%, 25%, and 50% missing values and with complete data. The colored lines represent the threshold parameter correlation with missing values and the colored stars represent the optimal threshold parameter correlation (without missing values). The grey number at the right bottom corner corresponds to the panel number.

are applied, the greater the possibility to converge and hence the better the result (at least up to a certain amount of itera-tions). Whereas the number of imputation iterations was only 15 for the eLasso method, 25 iterations were permitted for the uni method. This choice was made because preliminary sim-ulations showed that the uni method needs more iterations to arrive at good results. In the case of the uni method, even more iterations were added by the ThresDiff repair mechanism be-cause every missing value imputation was repeated four times in order to control for faulty imputations. Thus, the uni method benefited from many more iterations compared to the eLasso method. We cannot disregard this as an explanation for the differences in imputation results between the two methods and we strongly advise to include the number of iterations as a condition for further missing data imputation studies.

Study III: Imputing Real Data

We imputed missing values in a real world data example as an illustration of the imputation procedure. The example data were taken from an adaptive practice and monitoring system named Math Garden7 _{(MG; Klinkenberg et al., 2011). We}

took correct and incorrect item answers from the addition game. An addition game item is a sum of two numbers, for example “3+ 39”. Children, that play the addition game in MG, see an item question (the sum) on the screen while they are provided with six possibly correct answer options, for the item “3+ 39”, the options are for instance 42, 32, 43, 69, 31 and 24. The players have to choose an answer option within 20

seconds and receive an immediate feedback about whether the item was correctly or incorrectly answered. The binary item answers (correct/incorrect) were used as data for the applica-tion of the imputaapplica-tion procedure to estimate an Ising network model of item relationships in the addition game. We com-pared the relationships of an early time point in a school year with the relationships at a late time point. We might be able to see changes in item relationships due to the math education and arithmetic practice the children received in between the early and late time point.

Methods

All items that were played from 24.11.2014 to 31.11.2014 be-longed to the early group and items played from 24.02.2015 to 02.03.2015 to the late group. Children were excluded from the analysis when they did not meet specific requirements to guarantee reliability of the data. We only included children that played at least 50 items since they had been registered in MG. Further, children were not included in the analysis if their birthday was still the default option (first of January) af-ter they had been regisaf-tered. Moreover, children were not in-cluded when they were younger than the age of four or older than 13 and when they were one and a half years older than the average age in their grade. Finally, we compared children

7_{A description of the adaptive technology behind Math Garden is}

(17)

Figure 12. The network structure of the MG real world example of an Ising model of correct/incorrect answers for the 15 most played addition items of children in grade three and four. The data for the early network graph were retrieved from the Dutch version of Math Garden in the last week of November 2014 and the late network graph in the last week of February 2015. Only node connections with a network parameter higher than 0.4 are displayed, the green connection represents a positive and the red connection a negative relationship between items. The thickness and the opacity corresponds to the strength of a connection.

in an early stage of grade three and grade four8_{with children}

in a late stage of these grades. In these groups of children, we selected the 15 most frequently played items per grade for the analysis9_{. Ultimately, 477 children with 20% missing}

val-ues were selected for the early time point of grade three, 215 children with 36% missing values for the late time point, 220 children with 40% missing values for the early time point of grade four and 231 children with 44% missing values for the late time point of grade four. The missing values were classi-fied as wave nonresponse because children missed “measure-ment occasions” (the items) due to the adaptive nature of the MG system. Whether a child played a certain item was deter-mined by the correct and incorrect item answers that the child had been given before, consequently the missing values were MAR. The imputation of the missing data took place with the uni method paired with the ThresDiff repair mechanism, because this method showed to be the best in the imputation simulation.

Results

Figure 12 depicts the associative relations between the 15 most played addition items of children in grade three (networks at the top) and the 15 most played items in grade four (networks at the bottom). Items that are played in grade three are natu-rally easier than the most frequently played items in grade four because children in grade four have a higher “additon” ability than children in grade three. A noticeable observation with re-gard to the four networks in Figure 12 is that the networks of

the late time points have stronger connections than their early counterparts. The early grade three network (Figure 12, top left) exhibits a strong negative association between the item “3+ 3” and “40 + 4”. This could mean that there are a lot of children in this group that are able to answer “3+ 3” correctly but have not yet learned summing numbers up to 40 and are consequently answering “40+ 4” incorrectly. This reasoning can be also applied to the items “11+ 4" and “72 + 2” of the late grade four group. Apparently, there are children in this group that are able to answer “11+ 4” correctly but fail when answering “72+ 2”. When making these claims we have to keep in mind that the addition game is a multiple choice game. This means that the difficulty of an item is not only determined by the difficulty of the actual sum that has to be answered, but also by the quality of the false answer options; when children are not distracted by the incorrect answer options, the item is easier than when they are distracted. In the late version of grade three and four there are more negative item relationships between two items which probably means that the group that plays these items is in between answering the easier item of the two correctly and the more difficult item incorrectly. An interesting question is why there are (positive) connections be-tween some items but not bebe-tween others. A positive con-nections between two nodes means that both items are either

8_{Grade three and four in the Dutch education system.}

9_{When an item was played multiple times by a child, the mean of}

the item answers was computed and then rounded to zero or one to obtain a single item answer per user.