The estimation of sparse networks

(1)

The Estimation of Sparse Networks

Internship Report of Daan van Renswoude

University of Amsterdam

D.R.vanRenswoude@uva.nl

July 11, 2014

Abstract

Although pairwise correlations are often used to estimate brain networks (Bullmore and Sporns, 2009; Sporns, 2011), this can lead to spurious connections in the network. Partial correlations are a better representation of the true network (Marrelec et al., 2006), but the use of partial correlations requires more observations than nodes, whereas fMRI data typically has more nodes (voxels) than observations (time points). Friedman et al. (2008) developed the Graphical Lasso (Glasso) to construct sparse networks from partial correlation matrices when the observation to node ratio is smaller than one. In this simulation study the Glasso was used to estimate three of the most frequently used networks; (1) the random network, (2) the preferential attachment network and (3) the small-world network from data with different observation to nodes ratios. The true underlying networks were compared to the estimated networks with the true positive rate (TPR), false positive rate (FPR) and hit rate. The Glasso estimated random and small-world networks with observation to node ratios smaller than one well, but had trouble to estimate preferential attachment networks with with fewer observation than nodes. Since most brain networks have a small-world structure (Sporns et al., 2004; van den Heuvel et al., 2008), it is concluded that the Glasso is a useful tool to estimate brain networks.

Introduction

The use of networks to describe and explain brain function has become more and more popular in recent years. Behrens and Sporns (2012) concluded that the research in this field will probably lead to significant advances in our understanding of human brain activ-ity in the near future. The network most often used to describe and explain brain functioning is the rest-ing state network (Rosazza and Minati, 2011). To construct a resting state network, the measured ac-tivity of several brain areas is correlated over time. Regions that have a correlation exceeding a certain threshold are assumed to have a functional

connec-tion (Bullmore and Sporns, 2009).

An example of a study that used the resting state network to explain brain functioning was conducted by van den Heuvel et al. (2009). They found a nega-tive correlation between average path length in rest-ing state networks and IQ. People with higher IQ’s had smaller average path lengths in their resting state networks than people with lower IQ’s. The average path length is the average number of steps to get from every node to every other node via the short-est path between these nodes. It is a measure of efficiency, because in efficient networks the number of steps between nodes is small. Therefore, van den Heuvel et al. (2009) concluded that the efficiency of

(2)

True Network

C

A

B

Pairwise Correlation Method

C

A

B

Partial Correlation Method

C

A

B

Figure 1: Partial correlation gives a better representation of the true network, than pairwise correlation

brain networks is related to intellectual performance. Although this seems like a good example of how brain networks can explain cognitive functioning, drawing conclusions like this should be done very carefully. The main reason to be careful is that van den Heuvel et al. (2009) used pairwise correlations to construct the resting state networks.

Using pairwise correlations to construct resting state networks is a common practice (Bullmore and Sporns, 2009; Sporns, 2011), but is probably also a bad practice. This is the case because, the use of pair-wise correlations may lead to spurious connections in the network. For instance, when a brain region A is connected to B and C, but regions B and C are not connected to each other, than B and C will correlate without having a connection (Langford et al., 2001), see Figure 1. Therefore a network based on pairwise correlations might not be a good representation of the true underlying network.

Moreover, in networks based on pairwise correla-tions clustering is high (Zalesky et al., 2012). This is intuitive when realizing that transitivity is often used to measure clustering. Transitivity is high when there are many triangles in the network, for instance, when a node X is connected to Y, Y to Z and Z to X, the relation between X, Y and Z is transitive. In

the example in the previous paragraph the relation between brain regions A, B and C is not transitive, but after using pairwise correlation to construct the network it is, see Figure 1. Therefore, the amount of clusters is over-estimated and because high clus-tering is a property of an efficient network the effect van den Heuvel et al. (2009) found might be due to using pairwise correlations instead of IQ differences. A better way to construct a resting state network is by using partial correlations.

When partial correlations are used to establish connections between nodes the influence of all other nodes is taken into account. Therefore spurious con-nections cannot occur when partial correlation is used. For this reason, networks resulting from partial correlations are a better representation of the true un-derlying networks than the networks resulting form pairwise correlations (Marrelec et al., 2006). Partial correlations can easily be obtained, because the in-verse of the correlation matrix is the partial correla-tion matrix. For normal distributed random variables the following relationship holds (Lauritzen, 1996)

(3)

Equation 1 states that the partial correlation matrix is zero if and only if node i and j are independent given all other nodes. In case of networks this means there is no direct connection between two nodes in a network if the partial correlation matrix is zero and vice versa.

To construct a resting state network researchers should simply take the partial correlation matrix and assume an edge between regions that are non-zero, instead of using the pairwise correlation matrix. Al-though this seems straight forward, this approach has a huge drawback when constructing large networks. Because the number of observations must be larger than the number of nodes to use partial correlation. To construct resting state networks this is problem, because participants can only lie in the fMRI scan-ner for a limited amount of time. Therefore there are almost always few observations compared to the number of possible nodes (voxels).

Friedman et al. (2008) came with a possible solu-tion to this problem. They developed the Graphical Lasso (Glasso) which makes it possible to construct a network even when the number of observations is smaller than the number of nodes in the network. The Glasso works well when the network is sparse, meaning there are few connections in the network. With few connections in the network the partial cor-relation matrix contains a lot of zero’s. The Glasso uses this fact in the estimation of networks because many possible connections can be set to zero. In this paper the Glasso will be used to estimate three of the most frequently used networks, (1) the random network (Erdös and Rényi, 1960), (2) the preferen-tial attachment network (Barabási and Albert, 1999) and (3) the small-world network (Watts and Stro-gatz, 1998). It will be tested how well the Glasso constructs networks based on observation to nodes ratios of 1₈, 1₄, 1₂, 1, 2, 4 and 8 for the three different networks.

The three types of networks will be simulated and the Glasso will be used to construct the underlying network. The networks the Glasso returns will be compared with the true networks. To see how well

the Glasso does, the true positive rate (TPR), false positive rate (FPR) and hit rate will be plotted for the seven observation to nodes ratios for each net-work. If the TPR increases and the FPR decreases with higher observation to nodes ratios this is an indi-cation the Glasso works fine for that type of network. Another indication that the Glasso works fine would be if the TPR exceeds the 50% even when the obser-vation to nodes ratio is smaller than one.

Method

To test the Glasso several steps were taken. First the correlation matrices of three networks; a random net-work, a preferential attachment network and a small-world network were simulated. Secondly, these cor-relation matrices were transformed to partial correla-tion matrices (PCM’s). These PCM’s were regular-ized with the proper rank in order to be multivariate normal distributed and transformed back to correla-tion matrices to generate data with the seven differ-ent observation to nodes ratios. Thirdly, the Glasso was used to construct the three different networks for the seven observation to nodes ratios. Finally, to compare the networks returned by the Glasso with the true networks, the true positive rate (TPR), false positive rate (FPR) and hit rate were calculated. All these steps are describe in detail in this section. This study was conducted in R (R Core Team, 2014) and all code and functions needed to perform this study can be found in Appendix A.

The random network model

The Erdös-Rényi Method in the R-package igraph (Csardi and Nepusz, 2006) was used to simulate a random network (RN). In Erdös-Rényi networks ev-ery possible edge is created with the same probabil-ity; in this case 10%. To generate the network all 200 nodes are represented in the network and every com-bination of nodes had the same probability of 10% to get an edge assigned. This led to the creation of

(4)

a random network with 1946 edges, an average path length of 2.04 and a clustering coefficient of 0.1.

The preferential attachment network model

To generate a preferential attachment network the Barab´asi-Albert method of the igraph package was used. This method generates networks by an iterative process in which a node is added in every iteration. In this study one node with ten edges was added was added every iteration. The Barab´asi-Albert method uses a simple algorithm in which nodes with a high degree have a higher probability to get assigned an edge than nodes with a lower degree. The probability for a node i to get an edge assigned is

Pi=

Degreei+ 1 ΣDegreei+ N

For instance, when there are twenty nodes of which four have a degree of five and sixteen a degree of one; the chance that one of the nodes with a degree five gets assigned an edge is

Pi= Degreei+ 1 ΣDegreei+ N = 5 + 1 36 + 20 = 6 56 = 3 28

Whereas the chance that a node with a degree of one gets an edges assigned is only

Pi= Degreei+ 1 ΣDegreei+ N = 1 + 1 36 + 20 = 2 56 = 1 28

The Barab´asi-Albert method generated a network with 1945 edges, an average path length of 1.91 and a clustering coefficient of 0.2.

The small-world network model

To generate a small-world (SW) network the Watts-Strogatz method of the igraph package was used. In this method the 200 nodes were represented in a ring lattice with edges to the 10 closest neighbours. All edges had a probability of .05 to get rewired randomly to another node in the graph. This resulted in a network with 2000 edges, an average path length of 2.44 and a clustering coefficient of 0.53. Compared to the random network clustering is 5 times higher, while the average path length is only a bit longer. This leads to a small-world ratio of

SW ratio =

Clustering−SW Clustering−RN average path length−SW average path length−RN

= 0.53 0.1 2.44 2.04 = 4.43

When the small-world ratio is larger than 3 a network is considered to be a small-world network, therefore it can be concluded this is a small-world network.

Simulation of partial correlation matrices

To generate partial correlation matrices of the net-works, the three networks were represented as adja-cency matrices. In an adjaadja-cency matrix all edges get value one and all non-edges value zero. These ad-jacency matrices were element wise multiplied by a matrix filled with numbers from the uniform distri-bution and after the multiplication the diagonal was set to one. This resulted in a partial correlation ma-trix with a correlation when there was an edge and a zero when there was no edge. The generated matrices were not positive definite and needed to be regular-ized.

Regularization

One of the assumptions of the Glasso is that the data used to estimate networks is normally multivariate

(5)

distributed. In order to make the data obtained from a normal multivariate distribution the partial corre-lation matrices were regularized. To regularize the partial correlation matrices all eigenvalues must be positive. This was done by taking the absolute value of the smallest (negative) eigenvalue of the matrix and add that value to all eigenvalues, these new (all positive) eigenvalues are then used as the new eigen-values of the partial correlation matrix. This proce-dure ‘kills’ large correlations and makes them smaller in order to fit a normal multivariate distribution. The correlations became a lot smaller after regularization with maximum correlations around .2 for the random network, .08 for the preferential attachment network and .25 for the small-world network.

Data simulation

Networks based on the true underlying networks were simulated in order to compare the estimated networks to the true network. To make sure the simulated net-works had the true netnet-works as underlying structure, the following procedure was followed. The inverse of the regularized partial correlation matrix of each net-work was used as covariance matrix to simulate data. To add random noise to the network, that covari-ance matrix was Cholesky decomposed, transposed and multiplied with a matrix of size 200 x (number of observations) filled with numbers independently drawn from the normal distribution with mean zero and standard deviation one. This was done for all seven different number of observations. Data was generated for networks with 25, 50, 100, 200, 400, 800 and 1600 observations. Since the networks have 200 nodes, this resulted in observation to nodes ratios of 1 8, 1 4, 1 2, 1, 2, 4 and 8.

Estimation using the Glasso

The generated data of all three networks with the seven different numbers of observations is used to es-timate the underlying adjacency matrix using the R-package glasso (Friedman et al., 2013). The Glasso is

an algorithm which uses the fact that the inverse of the covariance matrix contains many zero’s because the networks it estimates are sparse. This makes it possible to estimate the inverse of the covariance ma-trix even though there are few observations, because many possible correlations can be set to zero and do not have to be estimated. The Glasso estimates the inverse covariance matrix Σ−1as the maximum of the penalized log-likelihood

log|Σ−1| − tr(SΣ−1) − ρkΣ−1k₁

over nonnegative definite matrices (Friedman et al., 2008). Here S is the empirical covariance matrix, tr is the trace, kΣ−1_k

1is the sum of the absolute values of the elements of Σ−1 _{and ρ refers to the penalty.} When ρ is higher, more possible correlations are set to zero than when ρ is lower. Because it is not known if there is an optimal value for ρ, 50 different values for ρ (ranging from .02 to 1) were used every time the Glasso estimated an inverse covariance matrix. The Glasso also returns a log-likelihood and the inverse covariance matrix with the highest log-likelihood was always used to derive the adjacency matrix.

Judging the accuracy of the Glasso

For a true graph G = (V, E) and estimated graph ˆ

G = ( ˆV , Ê), A = (i, j) ∈ E, Â = (i, j) ∈ Ê, B = (i, j) /∈ E and ˆB = (i, j) /∈ ˆB. The TPR is then defined as T P R = | Â ∩ A| |A| the FPR is defined as F P R =| ˆB ∩ B| |B| and the hit rate is defined as

(6)

Random Network Degree Frequency 10 15 20 25 30 0 10 20 30

Preferential Attachment Network

Degree Frequency 0 50 100 150 0 50 100 150 Small−World Network Degree Frequency 16 18 20 22 24 0 10 20 30 40 50 60

Figure 2: Degree distributions of the three different networks

hit rate = |( ˆA ∩ A) ∪ ( ˆB ∩ B)| |E|

The TPR, FPR and hit rate were used to compare true adjacency matrices with estimated adjacency matrices. The estimated adjacency matrices were cre-ated using four different thresholds; (1) no threshold, (2) a threshold at 25% of the non-zero correlations, (3) a threshold at 50% of the non-zero correlations, (4) and a threshold at 75% of the non-zero correla-tions. All correlations exceeding the threshold were recoded into one, correlations below the threshold were set to zero.

Results

After regularization the partial correlation matrices of the three different types of networks did not con-tain any high correlations. The random networks had correlations between 0 and .2, the preferential attach-ment network between 0 and .08 and the small-world network between 0 and .25. Especially in the pref-erential attachment network the correlations that re-mained after regularization were very low. The net-works showed their expected properties, the small-world network had the highest clustering, all

net-works had a small average path length and the degree distribution showed that the preferential attachment network had most nodes with a small degree, while the degrees in the small-world and random network were normally distributed, see Figure 2. This is ex-actly what is expected of these networks and it can therefore be concluded that the simulation of these networks went well.

Figure 3 shows the TPR, FPR and hit rate of the random network. From Figure 3 it is clear that the FPR does not really change when higher correlations are estimated instead of fewer or all correlations. This is what would be expected because the networks are sparse, so only few edges are estimated and therefore also few edges are estimated incorrectly. The FPR shows a maximum around one, this could be expected because for observation to nodes ratios smaller than one there are few observations and a lot of possible edges are automatically set to zero in order to make estimation possible. Fewer edges are set to zero when the observation to nodes ratio increases and there-fore more edges are wrongly estimated. When, after one, the observation to nodes ratio increases, there is enough data available and therefore the FPR de-creases again. The TPR however does change, as expected the TPR goes up to 100% as the observa-tion to nodes ratio increases and that happens even faster when only higher correlations are estimated. It

(7)

0.0 0.2 0.4 0.6 0.8 1.0

All correlations estimated

n/p 1/8 1/4 1/2 1 2 4 8 TPR FPR Hit rate 0.0 0.2 0.4 0.6 0.8 1.0

Top 75% estimated

Top 50% estimated

Top 25% estimated

n/p 1/8 1/4 1/2 1 2 4 8 TPR FPR Hit rate

Figure 3: The true positive rate (TPR, circles), false positive rate (FPR, triangles) and hit rate (crosses) for the seven observation to nodes ratios (n/p) of the random network

(8)

0.0 0.2 0.4 0.6 0.8 1.0

All correlations estimated

Top 75% estimated

Top 50% estimated

Top 25% estimated

Figure 4: The true positive rate (TPR, circles), false positive rate (FPR, triangles) and hit rate (crosses) for the seven observation to nodes ratios (n/p) of the small-world network

(9)

0.0 0.2 0.4 0.6 0.8 1.0

All correlations estimated

Top 75% estimated

Top 50% estimated

Top 25% estimated

Figure 5: The true positive rate (TPR, circles), false positive rate (FPR, triangles) and hit rate (crosses) for the seven observation to nodes ratios (n/p) of the preferential attachment network

(10)

is also important to see that the TPR is above 50% most of the time even when the observation to nodes ratio is smaller than one.

The hit rate shows the opposite pattern of the FPR. This makes sense, because the FPR is the per-centage of incorrectly estimated edges, while the hit rate is the percentage of correctly estimated edges and non-edges. Because there are few edges, the hit rate is mainly dependent on the FPR and therefore they show opposite patterns. The results of the small-world network look very similar to this result of the random network as can be seen in Figure 4.

From Figure 4 it can be seen that the small-world network shows the same trend as the random net-work. The TPR increases as the number of ob-servations increases and that effect is even stronger when only higher correlations are estimated instead of lower or all correlations. Just like in the random network the TPR is above 50% for most of the ob-servation to nodes ratios, even when data is sparse. This indicates that the Glasso estimation seems to be useful when dealing with small-world networks with few observations.

From Figure 5 it becomes clear that the estimation of the preferential attachment network is not as good as the estimation of the random and small-world net-work. The TPR does not increase as fast as it does in the random and small-world network and does not exceed the 50% in most cases when the observation to nodes ratio is smaller than one. The hit rate how-ever, is still comparable with the hit rate in the ran-dom and small-world networks. As mentioned before this is the case because the hit rate is much more dependent on the FPR than on the TPR. Although the Glasso estimation seems useful for random and small-world networks, even with few observations, it seems not that useful for the preferential attachment network. Possible explanations are discussed in the discussion section below.

Discussion

The Glasso is able to estimate random and small-world networks from data with few observations, but has problems to estimate preferential attachment net-works with few observations. That the preferential attachment networks are not well estimated may be due to the regularization process.

The regularization process had a larger impact on the preferential attachment network than on the other two networks. In the regularization process the larger correlations are ‘killed’ and made smaller to fit a normal multivariate distribution. After regu-larization the highest correlation in the preferential attachment network was .08, whereas the highest cor-relation was .2 in the random network and .25 in the small-world network. A possible explanation may be that preferential attachment networks have a few nodes that correlate with many other nodes and many nodes with only a few correlations with other nodes. While in the random and small-world networks the correlations are more evenly distributed between the nodes. It may be that a more evenly distribution of correlation through the network is more similar to a multivariate normal distribution than the distri-bution of correlations in the preferential attachment network. This may explain why the regularization process effected the preferential attachment network more than the other networks.

Although the regularization process may be a prob-lem, the structure of the preferential attachment net-work might also be a reason the Glasso could not esti-mate these types of networks as accurately as random and small-world networks. When a partial correlation is estimated the influence of all other correlations is taken into account. Because the few nodes that cor-related with a lot of other nodes are taken out when a partial correlation between two nodes is estimated, the remaining correlation might be too weak to pick up. This can be especially problematic since the cor-relations were already very low after regularization.

(11)

esti-mate preferential attachment networks, but is able to estimate random and small-world networks. The small-world networks yielded even slightly better re-sults than the random networks and that is a promis-ing findpromis-ing. Brain networks are usually considered to have a small-world structure (Sporns et al., 2004; van den Heuvel et al., 2008), so based on this sim-ulation study the Glasso is expected to accurately estimate brain networks, even when there are few ob-servations.

Acknowledgements

I would like to thank Lourens Waldorp very much for supervising this project.

References

Barab´asi, A.-L. and Albert, R. (1999). Emer-gence of scaling in random networks. science, 286(5439):509–512.

Behrens, T. E. and Sporns, O. (2012). Human connectomics. Current opinion in neurobiology, 22(1):144–153.

Bullmore, E. and Sporns, O. (2009). Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Reviews Neuro-science, 10(3):186–198.

Csardi, G. and Nepusz, T. (2006). The igraph soft-ware package for complex network research. Inter-Journal, Complex Systems, 1695(5).

Erd¨os, P. and R´enyi, A. (1960). On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5:17–61.

Friedman, J., Hastie, T., and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432–441. Friedman, J., Hastie, T., Tibshirani, R., and

Tibshi-rani, M. R. (2013). Package ‘glasso’.

Langford, E., Schwertman, N., and Owens, M. (2001). Is the property of being positively cor-related transitive? The American Statistician, 55(4):322–325.

Lauritzen, S. L. (1996). Graphical models. Oxford University Press.

Marrelec, G., Krainik, A., Duffau, H., Pélégrini-Issac, M., Lehéricy, S., Doyon, J., and Benali, H. (2006). Partial correlation for functional brain interactiv-ity investigation in functional mri. Neuroimage, 32(1):228–237.

R Core Team (2014). R: A Language and Environ-ment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.

Rosazza, C. and Minati, L. (2011). Resting-state brain networks: literature review and clinical ap-plications. Neurological Sciences, 32(5):773–785. Sporns, O. (2011). The human connectome: a

com-plex network. Annals of the New York Academy of Sciences, 1224(1):109–125.

Sporns, O., Chialvo, D. R., Kaiser, M., and Hilgetag, C. C. (2004). Organization, development and func-tion of complex brain networks. Trends in cognitive sciences, 8(9):418–425.

van den Heuvel, M. P., Stam, C. J., Boersma, M., and Hulshoff Pol, H. (2008). Small-world and scale-free organization of voxel-based resting-state func-tional connectivity in the human brain. Neuroim-age, 43(3):528–539.

van den Heuvel, M. P., Stam, C. J., Kahn, R. S., and Pol Hulshoff, H. E. (2009). Efficiency of functional brain networks and intellectual performance. The Journal of Neuroscience, 29(23):7619–7624. Watts, D. J. and Strogatz, S. H. (1998).

Collec-tive dynamics of ‘small-world’ networks. nature, 393(6684):440–442.

Zalesky, A., Fornito, A., and Bullmore, E. (2012). On the use of correlation as a measure of network connectivity. Neuroimage, 60(4):2096–2106.

(12)

Appendix A

This appendix contains the code used to perform this study. The necessary packages

> library("corpcor") > library("glasso") > library("igraph")

Load functions to generate and test data. The functions below are written by Lourens Waldorp, Alexander Savi and Verena Schmittmann.

Function to calculate the TPR and FPR

> accuracy

<-+ function(adj.orig, adj.rec) { +

+ if (!all(diag(adj.orig) == 0)) {

+ warning("Values other than 0 on the diagonal of original (input)

+ adjacency matrix")

+ }

+

+ if (!all(diag(adj.rec) == 0)) {

+ warning("Values other than 0 on the diagonal of recovered (output)

+ adjacency matrix")

+ }

+

+ FP <- length(which(adj.orig == 0 & adj.rec == 1)) / 2 + FN <- length(which(adj.orig == 1 & adj.rec == 0)) / 2 + TP <- length(which(adj.orig == 1 & adj.rec == 1)) / 2 + TN < (length(which(adj.orig == 0 & adj.rec == 0))

-+ length(diag(adj.orig))) / 2 + + N <- (length(which(adj.orig == 0)) - length(diag(adj.orig))) / 2 + P <- length(which(adj.orig == 1)) / 2 + + FPR <- FP / N + FNR <- FN / P + TPR <- TP / P + TNR <- TN / N + + return(

(13)

+ c("TPR", "TNR","FPR", "FNR")))

+ )

+

+ }

Function to generate data

> data.generator

<-+ function (cvm, obs, rho, snr)

+ {

+ signalv <- signal.variance(cvm, obs, rho) + noisev <- signalv/snr

+ tplv <- c(1)

+ for (i in 2:obs) {

+ tplv[i] <- rho * tplv[i - 1]

+ }

+ tpl <- toeplitz(tplv) + n <- nrow(cvm)

+ zsamp <- matrix(, n, obs) + zsamp.noise <- matrix(, n, obs)

+ for (i in 1:obs) { + zsamp[, i] <- rnorm(n, 0, 1) + zsamp.noise[, i] <- rnorm(n, 0, 1) + } + chol.nod <- chol(cvm) + chol.time <- chol(tpl) + if (snr != 111 & snr != 222 & snr != 333) {

+ nod <- t(chol.nod) %*% zsamp %*% chol.time + zsamp.noise + tnod <- t(nod)

+ }

+ if (snr == 111) {

+ nod <- t(chol.nod) %*% zsamp + tnod <- t(nod)

+ }

+ if (snr == 222) {

+ tnod <- t(zsamp %*% chol.time)

+ } + if (snr == 333) { + tnod <- t(zsamp.noise) + } + return(tnod) + }

(14)

> signal.variance

<-+ function(cvm, obs, rho) { +

+ # Calculate maximum of variance + var.max <- max(abs(diag(cvm))) +

+ # Calculate correction to number of observations + T <- obs * ((1 - rho) / (1 + rho))

+

+ # Calculate signal variance + signal <- T * var.max +

+ # Return signal variance

+ return(signal)

+

+ }

Function to regularize the covariance matrix

> regularization.lourens <- function(a, offset) { + # Regularize

+ mat <- (a + (diag(dim(a)[1]) * (abs(min(eigen(a)$values)) + offset))) / + (1 + (abs(min(eigen(a)$values)) + offset))

+

+ # Check if regularized matrix is positive definite + if (!is.positive.definite(mat)) {

+ warning("No positive definite, no good. (after regularization)") + print(min(eigen(mat)$values))

+ }

+

+ # Return regularized matrix + return(mat)

+ }

The code below is written by Daan van Renswoude Function to test the performance of the Glasso

> TrueVSglasso <- function(pcm, cvm, adj.all){ + hit.rate <- numeric()

+ adj.25 <- ifelse(pcm > quantile(pcm[pcm > 0])[2], 1, 0) + adj.50 <- ifelse(pcm > quantile(pcm[pcm > 0])[3], 1, 0)

(15)

+ adj.75 <- ifelse(pcm > quantile(pcm[pcm > 0])[4], 1, 0) + diag(adj.25) <- 0 + diag(adj.50) <- 0 + diag(adj.75) <- 0 + acc <- matrix(0,28,4) + obs <- 25 + for (j in 1:7){

+ dat <- data.generator(solve(cvm), obs=obs, rho=.5, snr=111) + obs <- obs * 2 + + model <- list() + loglik <- c() + for (i in 1:50){ + model[[i]] <- glasso(cov(dat),rho=(i/50)) + loglik[i] <- model[[i]]$loglik + } + + adj.rec <- ifelse(model[[which.max(loglik)]]$wi>0,1,0) + diag(adj.rec) <- 0 + + acc[j*4-3,] <- accuracy(adj.all,adj.rec) + acc[j*4-2,] <- accuracy(adj.25,adj.rec) + acc[j*4-1,] <- accuracy(adj.50,adj.rec) + acc[j*4,] <- accuracy(adj.75,adj.rec) +

+ hit.rate[j*4-3] <- 1-sum(as.logical(abs(adj.all - adj.rec)))/(200*(200-1)) + hit.rate[j*4-2] <- 1-sum(as.logical(abs(adj.25 - adj.rec)))/(200*(200-1)) + hit.rate[j*4-1] <- 1-sum(as.logical(abs(adj.50 - adj.rec)))/(200*(200-1)) + hit.rate[j*4] <- 1-sum(as.logical(abs(adj.75 - adj.rec)))/(200*(200-1))

+ }

+ return(cbind(acc,hit.rate)) + }

Code to generate network and a rpcm

> set.seed(1)

> # Create the adjacency matrices > ER <- erdos.renyi.game(200,1/10)

> BA <- barabasi.game(200,m=10,directed=F) > SW <- watts.strogatz.game(1, 200, 10, 0.05) > # Create a random partial correlation matrix > rpcm <- matrix(runif(40000,.1,1),200,200) > rpcm <- (rpcm+t(rpcm))/2

(16)

Code to perform the simulations for the random networks. Only the code to simulate an Erdos-Renyi network is supplied, other network were created using the exact same method, but with another network as input. The regularization values did differ, these were .05 for the SW network, .1 for the random network and 1 for the PA network.

> # Get the adjacency matrix > adj.all <- get.adjacency(ER) > adj.all <- as.matrix(adj.all) > # Make the covariance matrix > cvm <- adj.all * rpcm

> diag(cvm) <- 1 > # Regularize the cvm

> cvm <- regularization.lourens(cvm,.1)

> # Transform cvm matrix to partial correlation matrix > pcm <- cov2cor(cvm)

> # Calculate the TPR, FPR and hit rate > plotER <- TrueVSglasso(pcm, cvm, adj.all) > # Save to data

> write.table(plotER, 'plotER.txt')

Function to plot Figures 3, 4, and 5

> PLOT <- function(acc, hit.rate){ + layout(matrix(1:4,2,2))

+ plot(acc[seq(1,25,4),1],ylim=c(0,1),bty='n',xlab='n/p',ylab='',xaxt='n', + las=2,type='b',main='All correlations estimated')

+ axis(side=1,at=1:7,labels=c("1/8","1/4","1/2","1","2","4","8")) + par(new=TRUE) + plot(acc[seq(1,25,4),3],ylim=c(0,1),bty='n',xlab='',ylab='',xaxt='n', + las=2,pch=2,type='b', yaxt = 'n') + par(new=TRUE) + plot(hit.rate[seq(1,25,4)],ylim=c(0,1),bty='n',xlab='',ylab='',xaxt='n', + las=2,pch=3,type='b', yaxt = 'n')

+ legend(4,.2,c('TPR','FPR','Hit rate'),pch = 1:3, bty = 'n', horiz = T,

+ xjust = .5)

+

+ plot(acc[seq(2,26,4),1],ylim=c(0,1),bty='n',xlab='n/p',ylab='',xaxt='n', + las=2,type='b',main='Top 75% estimated')

+ axis(side=1,at=1:7,labels=c("1/8","1/4","1/2","1","2","4","8")) + par(new=TRUE) + plot(acc[seq(2,26,4),3],ylim=c(0,1),bty='n',xlab='',ylab='',xaxt='n', + las=2,pch=2,type='b', yaxt = 'n') + par(new=TRUE) + plot(hit.rate[seq(2,26,4)],ylim=c(0,1),bty='n',xlab='',ylab='',xaxt='n',

(17)

+ las=2,pch=3,type='b', yaxt = 'n')

+ xjust = .5)

+

+ xjust = .5)

+

+ xjust = .5)

+ }

Code to plot Figure 1

> layout(matrix(1:3,1,3))

> TrueGraph <- matrix(c(0,1,0,1,0,1,0,1,0),3,3) > CorGraph <- matrix(c(0,1,1,1,0,1,1,1,0),3,3)

> plot(graph.adjacency(TrueGraph, mode = 'undirected'), layout = layout.circle, + main = 'True Network', vertex.label = LETTERS[c(3,1,2)],

+ vertex.label.cex = 3, vertex.size = 60, vertex.label.color = 'black', + vertex.color = 'white', edge.width = 7, edge.color = 'gray',

+ vertex.frame.color = NA)

> plot(graph.adjacency(CorGraph, mode = 'undirected'), layout = layout.circle, + main = 'Pairwise Correlation Method', vertex.label = LETTERS[c(3,1,2)], + vertex.label.cex = 3, vertex.size = 60, vertex.label.color = 'black', + vertex.color = 'white', edge.width = 7, edge.color = 'gray',

(18)

> plot(graph.adjacency(TrueGraph, mode = 'undirected'), layout = layout.circle, + main = 'Partial Correlation Method', vertex.label = LETTERS[c(3,1,2)], + vertex.label.cex = 3, vertex.size = 60, vertex.label.color = 'black', + vertex.color = 'white', edge.width = 7, edge.color = 'gray',

+ , vertex.frame.color = NA)

Code to plot Figure 2

> layout(matrix(1:3,1,3))

> hist(degree(ER), main = 'Random Network', xlab = 'Degree', + las = 1, col = 'gray', border = 'gray')

> hist(degree(BA), main = 'Preferential Attachment Network', + xlab = 'Degree', las = 1, col = 'gray', border = 'gray') > hist(degree(SW), main = 'Small-World Network', xlab = 'Degree', + las = 1, col = 'gray', border = 'gray')