University of Groningen Modeling the dynamics of networks and continuous behavior Niezink, Nynke Martina Dorende

(1)

University of Groningen

Modeling the dynamics of networks and continuous behavior

Niezink, Nynke Martina Dorende

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below.

Document Version

Publisher's PDF, also known as Version of record

Publication date: 2018

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA):

Niezink, N. M. D. (2018). Modeling the dynamics of networks and continuous behavior. University of Groningen.

Copyright

Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

Take-down policy

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum.

(2)

6

Continuous versus discretized behavior

6.1 Introduction

Social networks are especially interesting in light of their interdependent dy-namics with the behavior of individual network actors. Peer selection based on shared behavior and social influence are two examples of interdependencies be-tween networks and individual behavior. The stochastic actor-oriented model enables the simultaneous analysis of the dynamics (or co-evolution) of social networks and the attributes of individual network actors.

In the original stochastic actor-oriented model for network-attribute co-evolu-tion, actor attributes are assumed to be measured on an ordinal discrete scale (Snijders et al., 2007; Steglich et al., 2010). This assumption allows for the entire co-evolution process to be modeled by a continuous-time Markov chain with a finite discrete state space (Norris, 1997). Continuous actor attributes have to be discretized to fit into this modeling framework.

Researchers have tackled the issue of discretizing continuous actor attributes in various ways. One solution is to use all recorded values, or all theoretically possible values, of the outcome variable to define the ordinal scale. De Klepper, Labianca, Sleebos, and Agneessens (2017) multiplied the average of 10-point scale peer ratings on competence status by 10 for this purpose. This solution implies no loss of information. Having a scale with many categories leads to many unit changes, and thus to high parameters (24.9 in the De Klepper et al. study), and consequently to a longer computation time.

A second solution is rounding. Caravita, Sijtsema, Rambaran, and Gini (2014) rounded the average score of a moral disengagement scale to the nearest integer to receive a scale with five categories. Note that, if scores would be uniformly distributed between 1 and 5, by construction, the outer categories in the rounded scale would be less populated.

(3)

Third, the scale of the variable is split in parts of equal length. Gesell et al. (2012), for example, collapsed childrens’ percentage scores of play time spent in moderate-to-vigorous physical activity into deciles (0–9%, 10–19%, etc.). Oth-ers split the scale in parts determined by the standard error of the variable (e.g., Dijkstra et al., 2012; Duriez, Giletta, Kuppens, and Vansteenkiste, 2013). Fourth, variables are categorized in such a way that all categories are equally populated. Flashman (2012), for example, transformed a grade point average (GPA) into quintiles so that a one unit increase in GPA represented a move-ment, for example, from the bottom 20 percent to the 20-40 percentile. Other studies have considered categories with cut-o↵s chosen based on the theoretical quintiles given the mean and standard error in the original distribution. Such transformations completely do away with the original distribution of a variable. Finally, pre-specified categories, for example those used in clinical practice, are used as a discretization scheme. Apart from rounding, De la Haye et al. (2011) used internationally validated age- and gender-specific body mass index cut-o↵s to classify respondents as overweight or obese. In Chapter 2, we discussed similar cut-o↵s for psychological distress. As a pre-specified discretization usu-ally only consists of a few categories, this approach leads to a loss of detail in attribute change.

The examples given are only a small subset of the studies in which continuous dependent actor attributes had to be discretized for a stochastic actor-oriented model analysis. The choice for a particular number of categories and the width of these categories often involves an arbitrary element. Moreover, in most cases discretization leads to loss of information. Little is known about the e↵ect of discretizing continuous variables on the results of stochastic actor-oriented model analyses.

In Chapter 2 and 3, we proposed an extension of the stochastic actor-oriented model for the co-evolution of social networks and continuous attributes. This model represents the dynamics of continuous attributes by a stochastic di↵er-ential equation (Øksendal, 2000). In this chapter, we discuss the similarities and di↵erences between the models for discrete and continuous actor attribute dynamics. In Section 6.2, we first present both models. Mathematically the models are quite di↵erent, but some of the parameters in the models can be interpreted similarly. We compare the models analytically in Section 6.3. We also present a first assessment of the e↵ect of discretizing continuous attributes, both in real (Section 6.4) and simulated (Section 6.5) data.

(4)

6.2 models for attribute dynamics 131

6.2 Models for attribute dynamics

Let X(t) denote the network among the n actors (i = 1, . . . , n) at time t, where

Xij(t) = 1 denotes the presence of a tie from actor i to actor j at time t and

Xij(t) = 0 its absence. The attribute value of actor i at time t is denoted

by Zi(t), and Z(t) = (Z1(t), . . . , Zn(t)). The stochastic actor-oriented model

models network attribute co-evolution based on panel data of the network and

actor attributes measured at times t1, . . . , tM. These observations at discrete

time points are assumed to be the outcomes of an underlying continuous-time

process. The period between tmand tm+1 is referred to as period m.

Section 6.2.1 presents the model used when actor attributes have discrete values,

ranging between cminand cmax. Section 6.2.2 presents the stochastic di↵erential

equation model for continuous attribute evolution. 6.2.1 Discrete attribute evolution

In the stochastic actor-oriented model, the dynamics of discrete attributes, like the network dynamics, are modeled by a continuous-time Markov chain. Due to the Markov assumption, all network and attribute changes are condition-ally independent of each other, given the current state of the process. This essentially separates the co-evolution process into a network change process and an attribute change process (Snijders et al., 2007). In most studies of network-attribute co-evolution these change processes are linked; the transition distribution of one process is a↵ected by the current state of the other. In the continuous-time Markov chain model, the time points at which actors change their attribute value are randomly determined. The waiting times be-tween consecutive behavior changes for an actor are exponentially distributed

with rate parameter [Z]m in period m. This rate may also be a ‘rate function’

that depends on network or actor characteristics. Here we assume the rates to be constant across actors. Rate parameters account for possible heterogeneity in period lengths and allow us to model each period as having unit duration. The actual length of a period m and amount of behavior change during this

period is absorbed by parameter [Z]m.

Discrete attribute change is assumed to occur in smallest possible steps. Once an actor gets the opportunity to make an attribute change, its value can only increase or decrease by one, or remain constant. Suppose actor i has the op-portunity to make an attribute change and the current state of the attributes is z, then the set of attribute values to which z can change is

(5)

Alternative definitions of_Bi(z) are possible. For example, actors may only be

allowed to increase their attribute value. The conditional probability that actor

i changes the attribute state z to ˜z_{2 B}i(z) is

pi(˜z| x, z) =

(

exp f_i[Z](x, ˜z) /P_z0_2Bi(z)exp f

[Z]

i (x, z0) if ˜z2 Bi(z),

0 if ˜z /2 Bi(z).

(6.2) This multinomial logit model is commonly interpreted as following from utility maximization (McFadden, 1974). Suppose the utility actor i attaches to a new

attribute state ˜z is the sum of an objective function fi(x, ˜z) and a random term

with standard Gumbel distribution. When actor i aims to maximize this utility, the choice probabilities reduce to (6.2).

Objective function f_i[Z](x, z) is given by a linear combination of e↵ects sik(x, z)

that depend on the network and attribute state,

f_i[Z](x, z) =

K

X

k=1

ksik(x, z). (6.3)

E↵ects may also depend on individual and dyadic covariates. The basic specifi-cation of the objective function contains a linear and a quadratic shape e↵ect,

f_i[Z](x, z) = 1(zi z) + 2(zi z)2, (6.4)

where z denotes a reference value. In RSiena, it is set to the observed at-tribute value averaged over all actors and time points (Ripley et al., 2018). The objective function (6.4) captures the basic shape of the observed attribute distribution.

6.2.2 Continuous attribute evolution

The dynamics of continuous variables in the stochastic actor-oriented model are modeled by a stochastic di↵erential equation. In this model, attribute change is considered as occurring continuously in time. We model the continuous at-tribute dynamics of actors i = 1, . . . , n in a period m by stochastic di↵erential equation dZi(t) = ⌧m[aZi(t) + b0+ K X k=1 bkuik(Z(t), X(t))]dt +p⌧mdWi(t). (6.5)

The rate of change in the attribute Zi(t) of actor i depends on the current level

of the attribute through feedback parameter a. This parameter determines whether over time attributes will increase or decrease indefinitely (a > 0) or

(6)

6.3 analytical comparison 133

whether they are generally stable (a < 0).1 _{Parameter b}

0is the intercept of the

model. Parameters bk represent the strength of the e↵ects uik(Z(t), X(t)) on

the attribute change. E↵ects may depend on the network state, the attribute

state of other actors j _{6= i, and on individual and dyadic covariates.}

Parameter ⌧m is a period-specific parameter, that accounts for possible

het-erogeneity in period lengths. The last term in equation (6.5) is a continuous-time error. A stochastic di↵erential equation contains a random error term, to account for the fact that non-modeled ‘random’ factors a↵ect the attribute evolution. When data collected at only two measurements is analyzed, we can

alternatively specify model (6.5) by setting ⌧1= 1 and letting an error

param-eter g represent the size of the random error.

6.3 Analytical comparison

The mathematical frameworks of the two models introduced above are di↵erent, but the parameters in the models have much in common (see Table 6.1).

More-over, many of the e↵ects sikthat have been formulated for the study of discrete

attribute evolution (Ripley et al., 2018) can be translated straightforwardly to

e↵ects uikin the stochastic di↵erential equation.

The rate and scale parameters both account for heterogeneity in period lengths. The stochastic actor-oriented model models each period as having length one; the actual period length is captured by the rate and scale parameters. The rate at which actors get the opportunity to change their attribute value and the di-rection of change are modeled independently in the discrete attribute evolution model. The stochastic di↵erential equation, on the other hand, integrates both functions.

Hereafter we derive the stationary distributions of the basic versions of the two models, to gain insight in how their parameters are related. Generally, parame-ters in stochastic actor-oriented models are estimated based on longitudinal data measuring a process of network and attribute change. In this context, stationary distributions of stochastic actor-oriented models are not the main focus. They can be useful, however, when modeling cross-sectional observations of social networks. Snijders and Steglich (2015) model cross-sectional network observa-tions using the stationary distribution of a stochastic actor-oriented model as a probability distribution.

1_{Note that we are using a simplified picture of stability here, based on the concept of}

stability for ordinary di↵erential equations. See Has’minskiˇi (1980) for a (technical) discussion of the stochastic stability of di↵erential equations.

(7)

Table 6.1: Similarities between the parameters in the models for discrete and continuous attribute dynamics.

Discrete Continuous Common function

rate [Z]m scale ⌧m Accounts for heterogeneity in

pe-riod lengths and makes it possible

to model each period tm to tm+1

as having unit duration

linear shape 1 intercept b0 Determines the location in the

at-tribute distribution (also the peak

if 2, a < 0)

quadratic shape 2 feedback a Determines the shape of the

dis-tribution: 2, a < 0 indicates

neg-ative feedback – extreme values will be pushed to the center of

the distribution; 2, a 0

indi-cate positive feedback – attribute values will be pushed to the ex-tremes

6.3.1 Stationary distributions

Calculating the general stationary distribution for a stochastic actor-oriented co-evolution model is not feasible. However, by assuming that the network has no e↵ect on the attribute dynamics, we can decouple the attribute model from the network model and study its stationary behavior separately. This is also

possible if the network is constant and the e↵ects uik((Z(t), X(t)) are not a

function of Z(t). In this case, a network-related e↵ect uik(X(t)) reduces to a

constant covariate.

Discrete attributes

The objective function corresponding to a basic model for discrete attribute

dynamics, that includes one constant covariate vi, is of the form

f_i[Z](z) = 1(zi z) + 2(zi z)2+ 3vizi. (6.6)

The probability that an actor increases his attribute value from c to c + 1 is

p(c! c + 1; ) = e

f (c+1)

(8)

for c = cmin, . . . , cmax 1, where f (c) denotes the objective function (6.6)

evalu-ated at zi = c. The probability that an actor decreases his attribute value from

c + 1 to c is

p(c + 1! c; ) = e

f (c)

ef (c)_{+ e}f (c+1)_{+ e}f (c+2), (6.8)

for c = cmin, . . . , cmax 1. For the boundaries, we have that

p(cmin! cmin+ 1; ) = ef (cmin+1) ef (cmin)_{+ e}f (cmin+1) (6.9) and p(cmax! cmax 1; ) = ef (cmax 1) ef (cmax)_{+ e}f (cmax 1). (6.10)

Denote the stationary probabilities by ⇡(c). The detailed balance equations (Norris, 1997) provide a sufficient condition for determining the values of ⇡(c),

⇡(c)p(c_{! c + 1; ) = ⇡(c + 1)p(c + 1 ! c; ),} (6.11)

where c = cmin, . . . , cmax 1. The probabilities ⇡(c) can be determined

recur-sively, using ⇡(c + 1) ⇡(c) = 1 + e1+ 2+2 2(c z)+ 3vi_{+ e}2 1+2 2(c+1 z)+ 3vi 1 + e 1 2 2 2(c z) 3vi+ e 2 1+22(c z)+ 3vi , (6.12)

for c = cmin+ 1, . . . , cmax 1, and similar expressions on the boundaries, and

⇡(c) = ⇡(cmin) c 1 Y ˜ c=cmin ⇡(˜c + 1) ⇡(˜c) , (6.13)

which is a telescoping product. Since the probabilities sum to 1, these equations fully specify the stationary distribution for the discrete attribute dynamics. Figure 6.1 shows six examples of stationary distributions of a five category variable, with reference value z = 3 and no covariate e↵ect. The top four figures illustrate that a zero linear shape e↵ect implies that the stationary distribution

is centered around z. Increasing the quadratic shape parameter 2 causes the

distribution to become less peaked. In Figure 6.1c, where 2 is slightly larger

than 0, the stationary distribution is almost uniform. When we increase 2

further, the stationary distribution becomes bimodal (see Figure 6.1d). The

bottom figures in Figure 6.1 illustrate that increasing 1 moves the center of

the distribution to the right. The objective function attains its maximum at

z 1/(2 2), which is 4 and 4.7 for Figures 6.1e and 6.1f respectively.

Note that the combination of reference value z = 3 and the stationary distri-bution depicted in Figure 6.1f is somewhat artificial. Setting z to the mean in

(9)

1 2 3 4 5 0.0 0.1 0.2 0.3 0.4 0.5 0.6 (a) 1= 0 and 2= 0.5. 1 2 3 4 5 0.0 0.1 0.2 0.3 0.4 0.5 0.6 (b) 1= 0 and 2= 0.2. 1 2 3 4 5 0.0 0.1 0.2 0.3 0.4 0.5 0.6 (c) 1= 0 and 2= 0.05. 1 2 3 4 5 0.0 0.1 0.2 0.3 0.4 0.5 0.6 (d) 1= 0 and 2= 0.2. 1 2 3 4 5 0.0 0.1 0.2 0.3 0.4 0.5 0.6 (e) 1= 1 and 2= 0.5. 1 2 3 4 5 0.0 0.1 0.2 0.3 0.4 0.5 0.6 (f) 1= 1.7 and 2= 0.5.

Figure 6.1: Stationary distributions corresponding to objective function (6.6),

with 3= 0 and z = 3.

Continuous attributes

The stationary distribution for the stochastic di↵erential equation model for continuous attribute dynamics,

dZi(t) = ⌧m[aZi(t) + b0+ b1vi]dt +p⌧mdWi(t), (6.14)

is a normal distributionN (µ, 2_{) with}

µ = b0+ b1vi

a , (6.15)

2₌ 1

2a. (6.16)

A stationary distribution only exists when feedback parameter a is negative. If a is positive, the attribute values will increase or decrease indefinitely for

(10)

t _{! 1. Both a negative quadratic shape parameter and a negative feedback}

parameter, in models with a basic specification (i.e., only the parameters in Table 6.1), give rise to a unimodal stationary distribution. A positive quadratic shape parameter in (6.6) gives rise to a bimodal stationary distribution. A positive feedback parameter in (6.14) does not.

6.3.2 Comparison

Figure 6.2 shows the normal distributions with mean (6.15), with b1 set to 0,

and variance (6.16) that most closely match the distributions in Figure 6.1. The

feedback parameters a and intercept parameters b0are estimated by minimizing

the sum over the five categories c of the squared deviations between the normal

probability masses between c 0.5 and c + 0.5, and the discrete category

proba-bilities ⇡(c). The means b0/a of the top four distributions are 3 and the means

of the bottom two distributions are 3.97 and 4.56. Large negative values of a correspond to a peaked distribution. The nearly uniform distribution (Figure 6.2c) and the bimodal distribution (Figure 6.2d) are not well represented by a normal distribution. The same holds for the skewed distribution in Figure 6.2f.

Figure 6.3 shows three distributions with mean µ = b0/a = 3. In case we

consider the discrete model parameters for the closest discrete stationary

dis-tribution, with z = µ, the linear shape parameter 1is 0. The quadratic shape

parameters corresponding to a = 0.5, 1 and 1.5 are 2 = 0.25, 0.59

and 0.94.

In conclusion, the most basic specification of the stochastic di↵erential equa-tion can properly represent staequa-tionary distribuequa-tions that are unimodal and fairly symmetric. The range of stationary distributions that can be modeled by the basic discrete attribute model is broader. However, three remarks are to be made here. First, we have only compared very simple models. Covariates in a stochastic di↵erential equation may be able to explain multimodality (for ex-ample, if di↵erent peaks represent di↵erent groups). Second, skew distributions of continuous variables can sometimes be symmetrized by an appropriate trans-formation. Third, the normal stationary distribution rests on our particular choice for attribute model (6.5), in particular, the choice of the di↵usion coef-ficient. By selecting a di↵erent di↵usion coefficient we can obtain, for example, a Beta distribution or a Gamma distribution as the stationary distribution of a stochastic di↵erential equation. Bibby, Skovgaard, and Sørensen (2005) show how many common distributions can be obtained as stationary distribution by a suitable choice of stochastic di↵erential equation.

(11)

1 2 3 4 5 0.0 0.1 0.2 0.3 0.4 0.5 0.6 (a) a = 0.86 and b0= 2.59. 1 2 3 4 5 0.0 0.1 0.2 0.3 0.4 0.5 0.6 (b) a = 0.44 and b0= 1.31. 1 2 3 4 5 0.0 0.1 0.2 0.3 0.4 0.5 0.6 (c) a = 0.18 and b0= 0.54. 1 2 3 4 5 0.0 0.1 0.2 0.3 0.4 0.5 0.6 (d) a = 0.017 and b0= 0.051. 1 2 3 4 5 0.0 0.1 0.2 0.3 0.4 0.5 0.6 (e) a = 0.95 and b0= 3.77. 1 2 3 4 5 0.0 0.1 0.2 0.3 0.4 0.5 0.6 (f) a = 1.25 and b0= 5.77.

Figure 6.2: Normal distributions with mean (6.15), with b1 = 0, and variance

(6.16) fitted to the distributions in Figure 6.1. The corresponding estimates of

feedback parameter a and intercept b0are given.

6.4 Real data study

In this study, we investigate the e↵ect of discretizing continuous attributes in the stochastic actor-oriented model based on school class data collected by Knecht (2004). We base our analysis on the study of the co-evolution of the friendship ties between adolescents and their academic performance, in terms of mathe-matics grades, discussed in Chapter 4. We use the same model specification as was used in Chapter 4 and focus only on the 33 classes for which converged estimates were obtained.

For these classes, we discretize the mathematics grade data in three di↵erent ways. We compare the results of the analyses in which grades are treated as continuous variables to those where they are treated as discrete variables. The

(12)

6.4 real data study 139 1 2 3 4 5 6 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Figure 6.3: Normal distributions with mean 3 and and variance 1/(2a), with

a = 0.5 (solid line), a = 1 (dashed) and a = 1.5 (dotted).

discretization of a continuous variable may a↵ect the range of the variable, and thus the scale of the estimated grade-related parameters. Therefore, we compare the results based on the t-ratios of the estimated parameters.

6.4.1 Treatments of the grade data

In the study, mathematics grades were measured on a scale from 10 to 100. While modeling the co-evolution of friendship and mathematics grades, we treat the grade data in four ways:

1. (continuous) by dividing the grades by 10, and centering them around the overall grade average;

2. (discrete) by dividing the grades by 5, and rounding (18 categories); 3. (discrete) by dividing the grades by 10, and rounding (9 categories); 4. (discrete) by splitting the grade scale into 5 equally populated categories. Treatment 1 was used in the Chapter 4. Figure 6.4 shows the grade distributions for the four treatments of the grade data. In Figure 6.4a, we see peaks at the integers. Multiples of 10 occur very regularly throughout the data set. For only a few classroom-measurement combinations the raw grade scores were all multiples of 10.

For 14 classrooms we could obtain 5 roughly equally populated categories. As some values occur very regularly throughout the data, in some classes it was impossible to identify 4 unique cut-o↵ points for the quintiles.

(13)

frequency 2 4 6 8 10 0 100 200 300 400 500 600 (a) Continuous. 3 4 5 6 8 9 10 12 14 16 18 20 frequency 0 100 300 500 700 (b) Discrete (18). 2 3 4 5 6 7 8 9 10 frequency 0 200 400 600 800 (c) Discrete (9). 1 2 3 4 5 frequency 0 50 100 150 200 (d) Discrete (5).

Figure 6.4: Grade distributions. The number of categories in the discrete dis-tributions are given between parentheses.

6.4.2 Results

We specified all models as was done in Chapter 4. Table 6.2 shows how many of the analyses per treatment of the attribute data resulted in non-convergence or exceptionally high, unreliably estimated standard errors, indicated by high condition numbers of the Jacobian. The problem of inflated standard errors erors was discussed in Chapter 5.

Figure 6.5 compares the t-ratios corresponding to six parameters in the models of friendship and grade dynamics. The t-ratios obtained for the model with grades treated as a continuous variable are given on the horizontal axis, and the Table 6.2: The number of analyses resulting in non-convergence or unreliable standard error estimates, and the number of adequately estimated models.

total not converged s.e. issues n

discrete: 18 categories 33 1 4 28

discrete: 9 categories 33 1 4 28

(14)

6.4 real data study 141 vertical axis gives the t-ratios when grades are discretized. Values corresponding to the same class can be found along an imaginary vertical line. The solid line indicates where t-ratios are equal.

Figure 6.5a shows that discretizing the grade variable has little e↵ect on the t-ratios for the reciprocity parameter. The same is true for the other structural e↵ects and for the gender homophily e↵ect in the friendship dynamics model. The discretizations do a↵ect the t-ratios for the grade-related parameters. In the network evolution part of the model, the discretization into 18 categories a↵ects the t-ratios the least.

For the average alter e↵ect in the grade dynamics model, the e↵ect of the discretizations is largest, as is shown in Figure 6.5f. This is the e↵ect for which the points are scattered the furthest away from the diagonal. This might well be due to the fact that discretization a↵ects the average alter e↵ect in two ways; it a↵ects the grade of an individual as well as the average grade of the individual’s friends.

Table 6.3 summarizes the di↵erences between the t-ratios numerically. For pairs of t-ratios, it presents their average absolute di↵erence and their coefficient of identity. The coefficient of identity for two variables X and Y is defined by Zegers and Ten Berge (1985) as

IX,Y = 1

E((X Y )2₎

E(X2_{) + E(Y}2₎ =

2 E(X Y )

E(X2_{) + E(Y}2₎. (6.17)

If the agreement between the t-ratios tc for the models with grade treated as a

continuous variable and the t-ratios td for the models with grade treated as a

discrete variable with d categoriesis high, then their coefficient of identity Itc,td

will be close to 1.

Table 6.3 shows that the coefficient of identity is close to 1 for most of the e↵ects in the friendship dynamics model. For the grade-related e↵ects on friendship evolution, the di↵erences in t-ratios are largest. The same pattern is visible for the average absolute di↵erences. As was shown in Figures 6.5b, 6.5c and 6.5d, the discretization into 18 categories a↵ects the t-ratios the least. The di↵erences increase with decreasing the number of categories. The t-ratio di↵erences for the parameters in the grade dynamics models are very large, especially for the average alter e↵ect, as was shown in Figures 6.5e and 6.5f. Also for the e↵ects in the grade dynamics model, the coefficient of identity with the t-ratios corresponding to the continuous grade treatment is largest for the 18 category discretization.

For some classes, the discretizations result in di↵erent conclusions based on

(15)

● ● ● ● ● 0 2 4 6 0 2 4 6 continuous t−ratio discrete t − ratio ● ● ● ● ● ● ● ● ● ● ● ● ● (a) Reciprocity. ● ● ● ● ● −1 0 1 2 − 2 − 1 0 1 2 continuous t−ratio discrete t − ratio ● ● ● ● ● ● ● ● ● ● ● ● ● (b) Grade ego. ● ● ● ● ● −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 − 1.5 − 0.5 0.0 0.5 1.0 1.5 continuous t−ratio discrete t − ratio ● ● ● ● ● ● ● ● ● ● ● ● ● (c) Grade alter. ● ● ● ● ● −1 0 1 2 − 1 0 1 2 continuous t−ratio discrete t − ratio ● ● ● ● ● ● ● ● ● ● ● ● ●

(d) Grade ego⇥ alter.

● ● ● ● ● −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 − 1.5 − 1.0 − 0.5 0.0 0.5 1.0 1.5 continuous t−ratio discrete t − ratio ● ● ● ● ● ● ● ● ● ● ● ● ●

(e) Gender (grade).

● ● ● ● ● −1.0 −0.5 0.0 0.5 1.0 1.5 − 1.0 − 0.5 0.0 0.5 1.0 continuous t−ratio discrete t − ratio ● ● ● ● ● ● ● ● ● ● ● ● ●

(f) Average alter (grade).

Figure 6.5: Comparison of t-ratios of the parameters estimated for the Knecht (2004) data with continuous grades and with discretized grades. The blue plusses, green triangles and red circles indicate the 18, 9 and 5 category es-timates respectively. Figures (e) and (f) concern grade dynamics parameters.

(16)

6.5 simulation study 143

Table 6.3: Average absolute di↵erence (aad) and coefficient of identity Itc,td

as in (6.17) of the di↵erence between the t-ratios, comparing the models with continuous grade variables and with discrete grade variables with 18, 9 and 5 categories, based on respectively 28, 28 and 13 analyses.

aad Itc,td t18 t9 t5 t18 t9 t5 Friendship dynamics rate period 1 0.098 0.066 0.085 1.00 1.00 1.00 rate period 2 0.110 0.108 0.103 1.00 1.00 1.00 density 0.113 0.146 0.160 1.00 1.00 1.00 reciprocity 0.123 0.132 0.085 1.00 1.00 1.00 transitivity 0.120 0.130 0.124 1.00 1.00 1.00 transitivity_{⇥ reciprocity 0.058 0.063 0.096} 1.00 1.00 1.00 outdegree activity 0.069 0.069 0.109 1.00 1.00 0.99 outdegree popularity 0.111 0.133 0.112 1.00 1.00 1.00 indegree popularity 0.063 0.078 0.119 1.00 0.99 0.99 same gender 0.063 0.069 0.050 1.00 1.00 1.00 grades ego 0.134 0.220 0.482 0.99 0.97 0.76 grades alter 0.122 0.231 0.287 0.99 0.95 0.91

grades ego⇥ grades alter 0.112 0.256 0.391 0.99 0.94 0.89

Grade dynamics

average alter 0.326 0.371 0.405 0.71 0.67 0.68

gender (male) 0.173 0.201 0.401 0.96 0.94 0.77

in Figure 6.5b) and the grade ego⇥ grade alter e↵ect (Figure 6.5d). However,

based on the analyses of the 33 classrooms we cannot conclude that this is a general consequence of discretizing continuous variables. In the next section, we study the e↵ect of discretizing continuous actor attributes in a more controlled setting.

6.5 Simulation study

We consider the e↵ect of discretizing continuous attributes in a simulation study, using the context of the study on friendship and psychological distress among adolescents in Chapter 2. The general structure of the simulation study is as follows. First, we simulate the co-evolution of the network and the continuous actor attributes multiple times. Then, for each simulated data set we discretize the actor attributes in various ways, using di↵erent numbers of categories. We fit a stochastic actor-oriented model to each of the simulated data sets with

(17)

distress score frequency 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 0 5 10 15

(a) Distress distribution.

distress score difference

frequency −2 −1 0 1 2 0 5 10 15 20

(b) Distress change distribution.

Figure 6.6: Distributions of the distress scores at the first measurement and the change between the first and second measurement.

discretized attributes and with the original continuous attribute data. After checking whether parameters are re-estimated correctly, we compare the di↵er-ent ways of handling continuous variables, based on t-ratio comparisons and considering power estimates.

Each simulated data set consists of one period of network-attribute co-evolution, starting from the observed first measurement of the friendship network and psychological distress data collected by Doddema (2014). The distribution of the distress scores at the first measurement is shown in Figure 6.6a. This distribution is somewhat right-skewed. This does not violate any assumptions of the stochastic di↵erential equation model for attribute change; the distribution of change in the distress scores is fairly normally distributed (Figure 6.6b). We use a slightly simplified model specification, compared to the model esti-mated in Chapter 2, leaving out some e↵ects for which the estiesti-mated parameter was very small.

6.5.1 Study design

The simulation parameters are given in Table 6.4. We consider three simula-tion models, one with a high average alter parameter of 0.9, one with a lower parameter of 0.5, and one with zero average alter e↵ect. The three models are simulated 200 times. For each simulated data set, the actor attributes are treated in three ways:

(18)

6.5 simulation study 145 2. (discrete) by adding 2.1 to the simulated attributes, and splitting the scale

into eight categories: <1.5, 1.5–2, . . . , 4–4.5, >4.5.

3. (discrete) by adding 2.1 to the simulated attributes, and splitting the scale into fourteen categories: <1.5, 1.5–1.75, . . . , 4.25–4.5, >4.5.

The value 2.1 is added to compensate for the fact that we center the attributes before the simulation. A stochastic actor-oriented model is re-estimated on each

of the resulting 200⇥ 3 ⇥ 3 = 1800 data sets.

Centering the attribute values before simulation allows us to leave out the usual centering in the average alter e↵ect, when re-estimating our model with a con-tinuous attribute variable. As the estimated attribute mean varies over the simulated data sets, we thus remove a source of variation.

The simulation parameters in Table 6.4 are similar in size to the parameter

estimates in Chapter 2, with the exception of intercept parameter b0. The

di↵erence can be explained by two changes we made to the stochastic di↵erential equation model. First, we model social influence by an average alter e↵ect instead of a minimum alter e↵ect, as the average alter e↵ect is more commonly used. Moreover, for discretizations with few categories the minimum alter values are often equal for nearly all actors, making the minimum alter e↵ect practically

Table 6.4: Simulation parameters. parameter Network dynamics rate 11.9 density 3.2 reciprocity 2.8 transitivity (gwesp) 1.8 cyclicity (gwesp) 0.8 outdegree popularity (sqrt) 0.1 outdegree activity (sqrt) 0.1 same gender 0.3 same class 0.4 attribute ego 0.2 Attribute dynamics feedback a 0.4 intercept b0 0.1 error g 0.4

(19)

indistinguishable from the intercept e↵ect. Second, we center the attribute values before simulation. Both changes a↵ect the intercept parameter.

The di↵erence between the mean average alter value observed at the first mea-surement and the mean minimum alter value is 0.88. To make sure that the attribute values simulated by a model with an average alter e↵ect are in a reasonable range, and given that we first select an average alter parameter of

0.9 in our simulation model, we therefore subtract 0.9⇥ 0.88 from the original

intercept estimate 1.79, obtained in Chapter 2.

The average observed distress value over the first two measurements is approx-imately 2.1. Centering the observed scores around this value also decreases the intercept. This can be seen as follows. If a function z(t) satisfies the di↵erential equation

dz(t)

dt = a z(t) + b, (6.18)

then the transformation ˜z(t) = z(t) c satisfies

d˜z(t)

dt =

dz(t)

dt = a z(t) + b = a ˜z(t) + (b + ac). (6.19)

As we set feedback parameter a to be 0.4, centering the attribute values around

2.1 decreases the intercept parameter by 0.4⇥ 2.1. These changes explain the

order of magnitude change in the intercept parameter in Table 6.4. 6.5.2 Results

Table 6.5 shows the number of non-converged models and the number of con-verged models for which standard errors were estimated unreliably. We iden-tified potential standard error issues by considering outliers in the condition number of the Jacobian and inspecting histograms of the standard errors. For the 8 categories treatment 48 data sets and for the 14 categories treatment 2 data sets did not yield convergence. Non-convergence was due to the diver-gence of the average alter parameter during estimation. Therefore, after the parameter reached a value larger than 15, estimation was halted. Earlier trials had indicated that for such cases the parameter would otherwise increase until numerical problems would stop the estimation. For the data sets that yielded non-convergence we ran the analysis again, fixing the average alter parameter to zero and testing it using a score-type test (Schweinberger, 2012). These analyses yielded convergence for all but one model (8 categories, average alter 0.9). The results of the analyses with score-type tests will not be used in the comparison of parameter estimates and t-ratios, as the average alter parameter was fixed to zero. However, they will be used for estimating power to make

(20)

6.5 simulation study 147 Table 6.5: The number of simulated data sets (out of 200) resulting in model non-convergence or unreliable standard error estimates.

not converged s.e. issues

average alter 0.9 0.5 0 0.9 0.5 0

continuous 0 0 0 1 1 2

discrete: 14 categories 2 0 0 6 2 6

discrete: 8 categories 42 6 0 15 12 7

comparisons fair, as the originally non-converged results indicated a strongly positive average alter e↵ect.

The distribution of the condition number of the Jacobian for all treatments of the attribute data and influence levels is given in Figure 6.7. For all models, the standard errors, and thus the Jacobian, are computed based on N = 40,000 simulations in Phase 3 of the estimation algorithm (see Section 4.2 for a dis-cussion of the estimation algorithm). The condition number distributions for the models in which the actor attributes are treated as continuous variables are narrow and symmetric.

cont, avalt = 0.9

10log( condition number )

frequency 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 0 20 40 60 discr14, avalt = 0.9

frequency 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 0 50 100 discr8, avalt = 0.9

frequency 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 0 20 40 60 cont, avalt = 0.5

frequency 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 0 40 80 cont, avalt = 0

frequency 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 0 20 40 60 discr14, avalt = 0

frequency 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 0 20 40 60 discr8, avalt = 0

frequency

3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5

0

50

100

Figure 6.7: Distribution of the condition number of the Jacobian for the three treatments of the attribute data and the three average alter levels.

(21)

Table 6.6: Simulation results: average estimates (ˆ✓) and root mean squared errors (rmse) for the models with continuous attribute data, for three values

of the average alter (avalt) parameter b1.

avalt = 0.9 avalt = 0.5 avalt = 0

✓ ✓ˆ rmse ✓ˆ rmse ✓ˆ rmse

Network dynamics rate 11.9 11.77 0.67 11.82 0.73 11.72 0.75 density 3.2 3.15 0.31 3.16 0.29 3.21 0.31 reciprocity 2.8 2.82 0.17 2.80 0.16 2.80 0.15 transitivity 1.8 1.81 0.12 1.79 0.13 1.80 0.11 cyclicity 0.8 0.79 0.13 0.78 0.12 0.79 0.11 outdegree popularity 0.1 0.12 0.07 0.10 0.06 0.10 0.07 outdegree activity 0.1 0.11 0.05 0.11 0.05 0.10 0.05 same gender 0.3 0.30 0.08 0.30 0.07 0.30 0.08 same class 0.4 0.40 0.08 0.40 0.08 0.39 0.09 attribute ego 0.2 0.20 0.08 0.21 0.08 0.21 0.09 Attribute dynamics feedback a 0.4 0.40 0.07 0.40 0.07 0.40 0.07 intercept b0 0.1 0.10 0.04 0.09 0.04 0.11 0.04 error g 0.4 0.40 0.03 0.39 0.03 0.39 0.03 average alter b1 0.84 0.20 0.46 0.20 0.03 0.23 (n = 199) (n = 199) (n = 198)

For the models in the 14 category discretization, average alter 0.9 condition and the 8 category discretization condition, however, the condition number distribution is very skewed. This is an indication that these models were hard to estimate for the simulated data sets under study. This finding corresponds to the distribution of non-converged models over the conditions, given in Table 6.5: when parameters are hard to estimate, standard errors are also likely to be hard to estimate. For each of the conditions, we discard some analyses resulting in estimates with exceptionally high standard errors (see Table 6.5), based on outliers in standard error histograms.

Table 6.6 shows that the parameters in the co-evolution model of networks and continuous attributes are re-estimated well, for all average alter levels. This is also shown in Figure 6.8, which depicts boxplots of the estimated parameters in the network dynamics part of the model, for all average alter levels and all treatments of the attribute data. Note that there is no systematic deviation in the parameter estimates based on by the treatment of the attribute data or on the average alter parameter. Only the attribute ego parameter decreases in size,

(22)

6.5 simulation study 149 because of the increase in the attribute range for the discretization treatments. Boxplots of the attribute parameter estimates are given in Figure 6.9. Figure 6.9a graphically depicts the finding in Table 6.6 that the continuous attribute model parameters are generally re-estimated well, and that the deviations from the simulation value are largest for the average alter parameter. The average alter parameter is consistently slightly underestimated.

The estimates for the models with discretized actor attributes are given in Figure 6.9b. We see that with increasing number of categories, the attribute rate parameter increases. Increasing the number of categories decreases the average alter parameter, as was true for the attribute ego parameter. The di↵erence between 0.9 and 0.5 for the average alter parameter in the continuous attribute model also results in a considerable di↵erence in the average alter parameter in the models with discretized attributes.

We now turn to the question of the e↵ect of discretizing continuous actor at-tributes on stochastic actor-oriented model analyses. For this, we consider the estimated power of the parameters, given by the proportion of simulated data sets for which a parameter estimate was significantly di↵erent from 0 (in a two-sided t-test with p < 0.05). We include in this analysis the results of the models in which the average alter parameter was tested using a score-type test. Table 6.7 shows the power estimates for the attribute ego and average alter pa-rameter. For the other parameters in the model, power estimates were approxi-mately equal across all attribute treatments and average alter levels. Table 6.7 sketches the following trend. For the attribute ego parameter, estimated power does not vary much for di↵erent treatments of the attribute data and average alter levels. The results for the average alter parameter, on the other hand, clearly show an e↵ect of discretization on power. Both for the low and high average alter e↵ect, treating the attributes as continuous variables yields the highest power. This was to be expected, as this is the correct model specifica-Table 6.7: Power estimates, when actor attributes are treated as continuous variables (cont), or as discrete variables with 14 categories (dis14) or 8 categories (dis8).

average alter = 0.9 average alter = 0.5 average alter = 0

cont dis14 dis8 cont dis14 dis8 cont dis14 dis8

attribute ego 0.69 0.68 0.68 0.72 0.73 0.73 0.70 0.70 0.66

average alter 0.99 0.88 0.24 0.62 0.48 0.04 0.05 0.01 0.00

(23)

10 11 12 13 14 rate cont discr8 discr14 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● −4.0 −3.5 −3.0 −2.5 density cont discr8 discr14 ● ●● ● ● ●● ● ● ●● ● 2.4 2.6 2.8 3.0 3.2 recipr ocity cont discr8 discr14 ● ●● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ●● ● ● ● ● 1.5 1.7 1.9 2.1 transitivity cont discr8 discr14 ● ● ● ● ● ● ● ● ● ● ● ● ● −1.2 −1.0 −0.8 −0.6 −0.4 c yc licity cont discr8 discr14 ● ● ● ● ● ● ● ● ● −0.4 −0.3 −0.2 −0.1 0.0 0.1 outdegree popularity cont discr8 discr14 ● ● ● ● ● ● ● ● ● −0.30 −0.20 −0.10 0.00 outdegree activity cont discr8 discr14 ● ●● ● ● ● ● ● ● ● 0.1 0.2 0.3 0.4 0.5 same g ender cont discr8 discr14 ● ● ● ● ● ●● ● ● ● ●●● ● ● ● 0.2 0.3 0.4 0.5 0.6 same c lass cont discr8 discr14 ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.5 −0.3 −0.1 0.0 attrib ute ego cont discr8 discr14 a ver age alter = 0.9 a ver age alter = 0.5 a ver age alter = 0 Fi gu re 6. 8: B ox p lo ts of th e n et w or k p ar am et er es tim at es , fo r th e th re e av er ag e al te r le ve ls an d th e th re e tr ea tme n ts of th e at tr ib ut e d at a: as a co n tin u ou s va ria b le (co n t) , as a d iscr et e va ria b le wi th 8 ca teg or ies (d iscr 8) , an d as a d iscr et e va ria b le wi th 14 ca teg or ies (dis cr14). T he horiz on tal line indic ate s the parame te r value us ed for sim ulation.

(24)

6.5 simulation study 151 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● − 0.5 0.0 0.5 1.0

error feedback intercept average alter

average alter = 0.9 average alter = 0.5 average alter = 0

(a) Continuous attribute model parameters. The line segments indicate the ‘true’ parameter values that were used for simulation.

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●_●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●●● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

rate linear shape quadratic shape average alter

0 2 4 6 8 14 8 14 8 14 8 14 ncat

(b) Discrete attribute model parameters. The ‘ncat’ row at the bottom of the figure indicates the number of categories (8 or 14) in the discretization.

Figure 6.9: Boxplots of the attribute parameter estimates, for the three average alter levels and the continuous and two discretization treatments of the attribute data.

(25)

tion. Moreover, decreasing the number of categories in the discretization also decreases power. Note that the results for the zero average alter condition in Table 6.7 show that the estimated type I error rate (based on t-tests) for the average alter e↵ect is accurate for the continuous attribute model, given that we use a significance level of 0.05.

We consider the power estimates for the attribute ego and average alter param-eter in detail in Figures 6.10 and 6.11. For all average alter levels, the t-ratio estimates for the attribute ego parameter are similar for each treatment of the attribute data (Figure 6.10). Moreover, there is quite a clear relation between the value of the attribute ego parameter estimate in the continuous attribute model and that in the discrete attribute model.

For the average alter parameter, Figure 6.11 shows a very di↵erent picture. The relation between the estimates for the 8 category discretization treatment and the continuous treatment is much less pronounced. For both average alter levels 0.9 and 0.5, the average alter t-ratios corresponding to the discretization treat-ments of the attribute data are generally smaller than those for the continuous treatment. Moreover, the t-ratios for the 14 category treatment are generally larger than for the 8 category treatment. Note that the 8 category discretiza-tion t-ratios in Figure 6.11a convey a biased image. The score-type tests of the average alter e↵ect for the models that originally did not converge, was often significant. This result is not shown in the figure.

In conclusion, power for all but the average alter e↵ect is largely una↵ected by the discretization of the continuous attribute into 8 or 14 categories. For the average alter parameter, power is highest when the actor attribute is treated as a continuous variable, but the di↵erence with the 14 category discretization is small. The 8 category discretization yields considerably lower power.

6.6 Discussion

In the stochastic actor-oriented model for the co-evolution of social networks and individual behavior proposed by Snijders et al. (2007), actor attributes are assumed to be measured on an ordinal categorical scale. In Chapters 2 and 3, we proposed a stochastic actor-oriented model for the co-evolution of networks and continuous behavior. The models for continuous and discrete attribute evolution are mathematically very di↵erent, but their parameters can be matched based on having a similar function in modeling attribute change. In this chapter, we compared treating behavior variables as continuous or discrete, based on real and simulated data.

(26)

6.6 discussion 153 ● ● −0.5 −0.4 −0.3 −0.2 −0.1 0.0 − 0.20 − 0.15 − 0.10 − 0.05 0.00 continuous estimate discrete estimate ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

(a) Average alter 0.9: estimates.

● ● −4 −3 −2 −1 0 − 4 − 3 − 2 − 1 0 continuous t−ratio discrete t − ratio ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● _● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

(b) Average alter 0.9: t-ratios.

● ● −0.5 −0.4 −0.3 −0.2 −0.1 0.0 − 0.20 − 0.15 − 0.10 − 0.05 0.00 continuous estimate discrete estimate ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● _● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●_● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

(c) Average alter 0.5: estimates.

● ● −4 −3 −2 −1 − 4 − 3 − 2 − 1 continuous t−ratio discrete t − ratio ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● _● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

(d) Average alter 0.5: t-ratios.

● ● −0.5 −0.4 −0.3 −0.2 −0.1 0.0 − 0.25 − 0.20 − 0.15 − 0.10 − 0.05 0.00 continuous estimate discrete estimate ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● _● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

(e) Average alter 0: estimates.

● ● −4 −3 −2 −1 0 − 4 − 3 − 2 − 1 0 continuous t−ratio discrete t − ratio ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

(f) Average alter 0: t-ratios.

Figure 6.10: Comparison of the estimates and t-ratios of the attribute ego parameter for continuous versus discretized attribute data and three average alter levels. The red triangles indicate the 14 category estimates and the blue circles indicate the 8 category estimates.

(27)

● ● 0.4 0.6 0.8 1.0 1.2 1.4 1 2 3 4 5 6 continuous estimate discrete estimate ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● _● ● ● ● ● ● ●

(a) Average alter 0.9: estimates.

● ● 2 3 4 5 6 7 0.5 1.0 1.5 2.0 2.5 3.0 continuous t−ratio discrete t − ratio ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●_● ● ● ● ● ● ● ● ● ● ● _● _● ● ● ● ● _● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

(b) Average alter 0.9: t-ratios.

● ● −0.2 0.0 0.2 0.4 0.6 0.8 0.0 0.5 1.0 1.5 2.0 2.5 3.0 continuous estimate discrete estimate ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●_● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●_● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● _● ● ● ● ● ● ● ● ● ● ● ● ● _● ● ● ● ● ● ● ● ● ● ● ●

(c) Average alter 0.5: estimates.

● ● 0 1 2 3 4 5 − 1 0 1 2 3 continuous t−ratio discrete t − ratio ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● _● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● _● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●

(d) Average alter 0.5: t-ratios.

● ● −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 − 1.0 − 0.5 0.0 0.5 1.0 continuous estimate discrete estimate ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● _● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● _● ● ● ●

(e) Average alter 0: estimates.

● ● −2 −1 0 1 2 − 1 0 1 2 continuous t−ratio discrete t − ratio ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● _● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● _● ● ● ●

(f) Average alter 0: t-ratios.

Figure 6.11: Comparison of the estimates and t-ratios of the average alter pa-rameter for continuous versus discretized attribute data and three average alter levels. The red triangles indicate the 14 category estimates and the blue circles indicate the 8 category estimates.

(28)

6.6 discussion 155 In the real data study, we considered 33 small networks of between 26 and 33 students from the Knecht (2004) data, studying the co-evolution of friendship and grades. We find that the treatment of the grade variable a↵ects the grade-related network evolution parameters. For these parameters, increasing the number of categories increases the similarity in the t-ratios between treating grade as a continuous versus a discrete variable. The e↵ect of discretization on grade evolution parameters is much more prominent, and the same regularity is also observed here. Note that, due to convergence problems, we had less information about the discretization with the least number of categories than about the other two discretizations.

The basis of the simulated data study was a midsized network of 126 students (Doddema, 2014). The study shows that the results for all but the average alter e↵ect remain largely una↵ected by discretizing a continuous attribute into 8 or 14 categories. For the average alter parameter, the 14 category discretization yields larger power than the 8 category category discretization.

From these two studies, we can draw the following preliminary conclusions. The power for the average alter parameter is likely to be largest for continuous vari-ables and for discretizations with many categories. The results for the other parameters are likely to hardly be a↵ected under the attribute discretization. An exception may be the attribute-related parameters in a network evolution model; their results may change when a very coarse discretization is used. Al-though we did not consider this case in our current simulation study, the results of the real data study point to this conclusion.

In the simulation study, we only considered the most basic simulation setup to assess the e↵ect of discretization. We used the same network as initial net-work for each simulated data set. Alternatively, we could have sampled the initial network from a distribution of networks. This would have led to more variability and thus less clear results. In the future, we will conduct a more extensive simulation study, also systematically varying network size. Network size is an important determinant of statistical power. It will be interesting to explore which combinations of network size and average alter parameter lead to a large reduction in power after discretization. This information could be used to evaluate studies that have been conducted using discretized variables. We expect that for small networks, where peer influence e↵ects are hard to detect (Stadtfeld et al., 2017), changing from a discretized behavior variable to a con-tinuous one will not make a large di↵erence in terms of conclusions based on hypothesis tests.

One of the discretization schemes that has been used to fit continuous variables into the stochastic actor-oriented modeling framework is choosing categories in

(29)

such a way that they are equally populated. This is a scheme we would not recommend for several reasons. The scheme yields a discrete uniform attribute distribution, and completely does away with the original attribute distribution. Moreover, the interpretation of these categories as each containing a certain proportion of the data is somewhat odd. Over the course of a co-evolution simulation, the attribute distribution is bound to change, rendering the inter-pretation nonsensical. Finally, a small change in a continuous attribute value is more likely to result in a ‘jump’ in the corresponding discrete attribute value in some regions of the attribute scale than in others.