• No results found

The Inference Algorithm

6.3 Infinite Motif Stochastic Blockmodel

6.3.2 The Inference Algorithm

The aim of inference algorithm for IMM is to infer the latent variable z andB based on the observed motif sequence. In this work, we use Gibbs sampling algorithm to approximate the conditional probability of role assignment since it is a moderately efficient method for Bayesian models. Due to the page limit, we omit the detailed process but list the conditional distribution for sampling as follows:

P (zi= k|M, Z−i,α,β) ∝ (6.7)

( α

N −1+α

Qu,v,wΓ(muv w+ β) for existing cluster

ni ,k N −1+α

Q

u,v,wΓ(muv w+ β) for new cluster

whereΓ(n) = (n − 1)!is the Gamma function. The Gibbs sampling algorithm for IMM inference is shown in Algorithm 4.

6.4 Experiments

6.4.1 Experimental Setup

We evaluate our model on role discovery using synthetic and real-world net-works. For synthetic data, we generate two networks using SBM [HLL83] and

they are visualized in Figure 6.5. Each network consists of 100 nodes and these nodes belong to 4 roles. For real-world networks, we use Zachary karate and Les Misérables network. IMM is compared to some baseline models including MMSB, MMTM and IRM. We use the same (hyper)parameters in all models:

CRP parameterα = 5, Dirichlet parameterη = 4 and Beta parameterβ = 7. We leave the hyperparameter selection as our future work.

6.4.2 Evaluation Metrics

To evaluate the experimental results, we use purity and normalized mutual in-formation (NMI) as the evaluation metrics. These metrics are widely used in the evaluation of clustering methods with ground-truth labels. Formal definitions of both evaluation metrics have been introduced in Section 3.6.1 of Chapter 3.

6.4.3 Results

Synthetic networks

The results of role discovery on two synthetic networks are shown in Table 6.3.

From these results, some conclusions can be drawn:

• IMM outperform all other models in detecting roles which indicates the effectiveness of our model.

• Motif-based models perform better than edge-based models which demon-strate that high-order motifs can better capture the global role informa-tion.

• Nonparametric models (IMM and IRM) can effectively detect the number of roles which is more meaningful in practice when the role number is unknown.

Real-world networks

The visualization of roles on two real-world networks are shown in Figure 6.6 and 6.7. These results, demonstrate the effectiveness of IMM in identifying roles. In detail,

• Zachary network: The two blue nodes are stars for the left community, yellow and light blue are stars for the right community1, and red nodes

1We have prior knowledge on Zachary karate network that it consists of two communities.

Table 6.3: Experimental results on the synthetic networks.

datasets SYN 1 SYN 2

metrics Purity NMI Purity NMI

MMSB [ABFX08] 0.45 0.2603 0.33 0.1681 IRM [KTG+06] 0.45 0.2452 0.44 0.1758 MMTM [YHX13] 0.60 0.3964 0.48 0.2453 IMM (our model) 0.65 0.4204 0.51 0.2242

are peripheries.

• Les Misérables network: The blue nodes are stars including dark and sky blue nodes, red nodes are cliques, orange nodes are peripheries, and yel-low nodes are bridges to link stars and folyel-lowers.

6.5 Concluding Remarks

In this chapter, we attempt to find the answer to the research question Is is fea-sible to determine the number of roles automatically given a static graph? raised in Section 6.1. To answer this question, we proposed a novel generative model, infinite motif stochastic blockmodel (IMM), for role discovery. IMM is advanta-geous in two aspects: (1) it models higher-order motifs to infer the roles which

0 20 40 60 80

0 20 40 60 80

(a) Diagonal block structures.

0 20 40 60 80

0 20 40 60 80

(b) Non-diagonal block structures.

Figure 6.5: Visualization of two synthetic networks.

can effectively capture the global structural information of networks, and (2) it is a nonparametric Bayesian model to infer the number of roles automatically which is more suitable in real-world network analytics. We evaluated IMM in

Figure 6.6: Roles on Zachary network.

Figure 6.7: Roles on Les Misérables network.

role discovery compared to state-of-the-art blockmodels and the results indicate the effectiveness of IMM. In future work we will explore scalable inference algo-rithm for IMM, e.g., collapsed variational Bayesian inference method, and test our model on larger-scale networks. We will also extend our method to different types of networks, e.g., dynamic networks.

Chapter 7

Role Discovery on Dynamic Graphs

7.1 Introduction

In this chapter, we aim to answer the research question Q2.3 introduced in Section 1:

Q2.3: How can we effectively discover roles on dynamic graphs by capturing the global structures and dynamics?

Existing role discovery methods focus predominantly on static SNs. For

ex-ample, non-negative matrix factorization (NMF) based methods, such as RolX [HGER+12]

and GLRD [GERD13], cluster a node-feature matrix to discover roles. Stochas-tic blockmodels employ Bayesian methods for role discovery [ABFX09, FSX09].

In the real world, however, dynamic SNs are ubiquitous and structures of the networks will change over time. State-of-the-art static methods are not easy to extend to dynamic SNs directly. Few attempts have been made to discover roles and analyze role transitions in dynamic SNs. These attempts either neglect role transition analysis or perform role discovery and role transition learning separately. Evolutionary role clustering method [CRPS15] integrates temporal information into a weighting function for user similarity and clustering. How-ever, role transitions have not been analyzed in this chapter. DBMM [RGNH12, RGNH13] directly uses RolX to discover roles in each SN snapshot and then

analyzes the role transition based on obtained node-role matrices. Although role transitions are analyzed in this model, role discovery and role transition analytics are two separate steps, i.e., role transition information can be learned only after the role discovery process, as shown in Figure 7.1(b). As we show in this section, this strategy is inefficient and unstable in practice. These problems also remain in other studies for dynamic role discovery [LGD+13, ATRZ15]. A summary comparison of the state of the art in role discovery methods can be found in Table 7.1.

To sum up, there are two issues in previous work: (1) lack of role transition

Role matrix for snapshot t

Role matrix for snapshot t+1 Role transition matrix

Role transition matrix

n

1

n

2

n

1

n

2

Snapshot t Snapshot t+1

(a) Example of roles and role transitions in SNs.

(b) Role detection and role transition analysis in previous studies.

(c) Role detection and role transition analysis in DyNMF.

Role matrix for snapshot t+1 Role matrix for

snapshot t

Figure 7.1: Examples of roles and role analytics in previous methods and DyNMF.

analysis; and (2) inefficiency in role transition analysis. To tackle these issues, in this chapter we propose a new dynamic non-negative matrix factorization (DyNMF) approach. DyNMF is a unified model to discover role and role tran-sition simultaneously in dynamic SNs. An illustration of DyNMF is shown in Figure 7.1(c) where we can simultaneously obtain the role matrix of snapshot t + 1 and the role transition from snapshot t to t + 1 by using information in snapshott +1and the role matrix of snapshott. In particular, DyNMF can solve the two issues effectively and efficiently:

• For the issue of lack of role transition analysis, DyNMF explicitly introduces a role transition matrix for role transition, where roles and role transitions are modeled in a unified framework. Current and historical views are combined for role analytics. The current view follows RolX to discover roles in the current SN snapshot, while the historical view learns role transitions using past role information and the current SN snapshot.

• For the issue of inefficiency in role transition analysis, DyNMF, as a unified model, supports the simultaneous discovery of both roles and role transi-tions. In particular, it requires only one pass over the data to obtain roles and role transitions compared with previous studies.

DyNMF is also advantageous in a further aspect: by combining current and his-torical views, we can regularize the roles by capturing the temporal smoothness of roles and also reduce uncertainties and inconsistencies between snapshots.

Thus, temporal information is better explored compared to [CRPS15] and the discovered roles are more stable compared to DBMM.

All of these advantages of DyNMF are validated through extensive exper-iments on both synthetic and real-world SNs. The results validate DyNMF is advantageous in discovering roles, capturing role transitions, and predicting roles.