Influencing Social Networks Using Strategic Stubborn Nodes

(1)

Influencing Social Networks

Using Strategic Stubborn

Nodes

(2)

Influencing Social Networks Using

Strategic Stubborn Nodes

Bas van den Brink 11322195

Bachelor thesis Credits: 18 EC

Bachelor Kunstmatige Intelligentie

University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisor Dr. A. Haret

Institute for Logic, Language and Computation Faculty of Science

University of Amsterdam Science Park 107 1098 XG Amsterdam

(3)

Abstract

Influencing groups of people to believe (mis)information has been a much discussed topic, especially after the 2016 US election. In this thesis, we use the Social Distance Attachment (SDA) model to generate realistic social networks where nodes stand for agents whose beliefs are represented by a vector of numbers placed in a belief space, and are updated using the DeGroot model. We then introduce an agent that places a strategic stubborn node, called the influencer node, with the purpose of influencing the other nodes in the network towards a given position in the belief space, called the goal belief. We look at five strategies of influencing networks and evaluate them based on how effectively they are able to influence networks with different parameters. We find that strategies which place their influencer node near the goal perform better than the other strategies when the degree of homophily in a network is very low. However, as the homophily increases, the strategies focused on placing the influencer node near other nodes outperform the other strategies.

(4)

Chapter 1 Introduction

The spread of misinformation is an ever-present feature of human communities, with examples of ‘fake-news’ including rumors, misunderstandings of facts or tar-geted campaigns to mislead the public [4]. The potential for such instances of mis-information to damage society has been amplified with the wide-spread adoption of the internet and social media. This increases the pressure on the institutions, both private and public, which are supposed to inform the public about the truth, and raises a question about the extent to which certain agents can influence the collective belief by deliberately spreading false information.

A common way of analyzing the spread of (mis)information in a society is through formal models of social networks, three aspects of which will be of main interest to us here. The first aspect concerns the method for obtaining the social networks, i.e., the graphs representing individuals and the relationships between them. Networks like this can be extracted from real world data. However, they can also be generated automatically based on attributes seen in real world data. In this thesis, the focus will be on the second kind of network, and specifically on so-called random geometric networks (RGNs), in which nodes are assumed to be placed in a multidimensional Euclidean space and the connections are generated based on the distance between nodes [13]. More specifically, we will focus on a particular type of RGN, called the Social Distance Attachment (SDA) model, to simulate realistic social networks of groups of people where individuals make connections based on how similar they are.

The second aspect is a learning component, i.e., an account of how agents in the network update their beliefs in imitation of their peers. We will simulate this aspect using the DeGroot model, in which the belief of an agent is updated by taking a weighted average of the beliefs of its neighbors [6]. This corresponds to a view in which the beliefs of a node are influenced solely by the beliefs of the connected nodes, rather than by observation of the truth or performing experiments. The

(6)

aim of these models is for beliefs of the nodes within a network to converge on the truth [8], a phenomenon also known as ’the wisdom of the crowd’.

However, these models are vulnerable to being influenced by well-connected stubborn nodes, which provides us with the third aspect we will focus on. Un-constrained by the learning rules that other agents in the network use to calibrate their beliefs, stubborn nodes can use various strategies to spread their own belief through the network, and this can cause a shift in the collective belief from the truth [12]. When models of social network generation are combined with a learn-ing model and supplemented by the presence of stubborn nodes, a realistic picture emerges of how the spread of information through a network can be influenced by certain strategic agents.

The goal of this thesis is to explore different strategies to persuade social net-works into adopting a chosen set of beliefs. Computer simulations are performed with the purpose of discovering the different attributes of a network which influ-ence the effectiveness of the implemented strategies. These correlations between properties of a network and the spread of information can be used to see which societies are easily influenced by information or are quickly polarized. This is done in three different parts:

1. Implement a model which generates networks by placing nodes in a Eu-clidean space and connecting them based on the distance between the placed nodes. These beliefs are then updated using a learning model where the nodes update their location in the Euclidean space based on the location of connected nodes. This update is done in discrete time steps. The model generates the network based on a given network size and the average number of connections all the nodes have.

2. Implement an agent which, given a location in the Euclidean space called the goal belief, attempts to influence the average belief of the network towards this location. This is done by placing a stubborn node somewhere in the network and moving it around with each iteration of the learning model. Where the stubborn node is placed is based on which strategy is used. 3. Explore different possible strategies and compare how much different

strate-gies are able to influence the generated networks

The greatest challenge is developing the model to influence the networks optimally. This is for two reasons. Firstly, the agent will not be given all the information in the network such as connections between nodes, but just general high level information about the general beliefs of groups within the network. Lastly, because there can be much variation in human and thus social networks, the model will have to be very flexible with the limited information it has.

(7)

Contributions. The results are split up in multiple sections which each repre-sent a different form of influencing networks. The first one is non-radical influenc-ing, where the goal of the influencer node is within the range where the nodes of the network are placed. Secondly, radical influencing, where the goal is placed on the edge of the range where nodes are generated. Lastly, very radical influencing, where the goal is placed well outside of the range where nodes are generated. The implemented strategies are given scores for each type of influencing and for both dynamic and non-dynamic networks. The results of this thesis are applicable for strategies of spreading both true information and misinformation to move groups of people towards a (very) radical position and keep a network from radicalizing towards such a position.

Related work. Golub and Jackson [8] study conditions under which the beliefs of the agents in the network converge to the true belief in the DeGroot model. They also show that, under certain assumptions, e.g., vanishing influence of opinion leaders, large social networks are guaranteed to converge to the truth in the long run.

Banerjee et al. [2] look at how information spreads through actual real world social networks. In their research, they found that it is key when spreading in-formation, for the leaders which are well-connected people in the network to get this information and spread it to others. The communication centrality, a way to calculate how important someone is inside the network, of these leaders is the best way to predict how well the information spreads through the network.

Within learning models in social networks, there are two categories of learn-ing models: naive models, which is the focus of this thesis and Bayesian models. Within a Bayesian learning model, nodes in a network are ought to make the opti-mal choice between certain actions which are associated with a payoff. Following this, the node will choose another action based on this payoff, and the information given about the payoff of its neighboring nodes [1].

Within all the models mentioned above, the goal of the nodes in a network is to converge on some ground truth. In this thesis however, we look at if we can cause a network to converge on a different belief or opinion.

Another learning model is the Bounded Confidence (BC) model in [9]. In this model, a node only adopts the beliefs of other nodes if the difference between the two nodes is smaller than the confidence of the node in their current belief. This process of nodes getting their information form nodes with a similar beliefs is similar to the process of information spreading used in this thesis.

To further develop the bounded confidence model, Douven and Hegselmann [7] introduces aspects of misinformation seen in the COVID-19 pandemic to the model. One of these aspects is the irresponsible agent which can dogmatically

(8)

stick to their opinion.

Outline. This thesis is structured as follows: Firstly, in Chapter 2 several key theoretical concepts are introduced and explained. Secondly, in Chapter 3, the method of the research and the key steps are presented. Following this, in Chapter 4, the resulting data from the method is shown and discussed. And lastly, in Chapter 5, conclusions will be drawn from the data. Some sections of the thesis will also refer to the appendix found attached to the thesis and code located on https://gitlab-fnwi.uva.nl/11322195/influencing-social-networks.

(9)

Chapter 2 Theoretical background

2.1 Social networks

A social network is a graph N = (V, E) consisting of a set N of nodes, with |N | = n, and a set E ⊆ N × N of edges. Connections go from one node to another, so any node i can connect with node j and any node i can also connect with itself. However, in this thesis, we will focus on directed graphs. So while i can be connected to j, j does not have to have a connection to i. The edges can also have a weight w attached, which makes the network be a weighted network. This is how nodes can represent the intensity of relationships: for instance, close friends with a high weight, and acquaintances with a lower weight.

The nodes in a network N represent individuals, groups of individuals or other legal entities such as companies and countries. The edges represent links between nodes, such as social relationships (friendships, romantic relationships and business relationships) [10, 5]. Social networks can be obtained from a real world dataset or generated using various algorithms.

Connections in a social network can also be represented as an adjacency ma-trix, which is a matrix A where each row Ai is a vector that represents outgoing

connections from node i to the other nodes in the network, and each vector AT j

represents all the incoming connections to node j. If the network is undirected, we have that Ai,j = Aj,i.

Example 1. Figure 1 shows the undirected network N = (V, E), with nodes V = {0, 1, 2, 3, 4} and edges

E = {(0, 1), (0, 2), (1, 3), (2, 3), (2, 4), (3, 4)}∪{(1, 0), (2, 0), (3, 1), (3, 2), (4, 2), (4, 3)}. The adjacency matrix of N is also shown.

(10)

Network Adjacency matrix       0 1 1 0 0 1 0 0 1 0 1 0 0 1 1 0 1 1 0 1 0 0 1 1 0       Figure 2.1: A network and its adjacency matrix

Clusters. Clusters are groups of nodes within a graph that are well-connected with each other and less connected with nodes outside of the group. As the degree of clustering increases, individual clusters can bet better distinguished as can be seen in Figure 2.2. Clustering also has an effect on the spread of information within

Figure 2.2: Networks with a high and low degree of clustering

a network. Namely, information might not spread rapidly through a network if the degree of clustering is too high, as can be seen in [2].

Homophily. Homophily is a term that describes the tendency of people to form connections with similar people. This can occur based on biological attributes,

(11)

such as sex and ethnicity. However, it also applies to more abstract features such as religion and class [11]. In networks with high homophily, more of the connection between nodes are between nodes with similar features while there are few connection between nodes with radically different features.

There are multiple causes of homophily. The simplest cause is space: people tend to make connections with people that are physically close to them. Other causes of homophily are families, schools and workplaces. People make connections with family members people they work with in the workplace or in education [11]. Homophily can also be seen in social networks. For instance, segregation based on race inside a high school is shown in [5]. In this high school, it was concluded that children mostly had friends which were the same race as themselves.

2.2 Generating Social networks

One aspect of research into social networks is the generation of random social networks. This is done so researchers have more data to test and train their models on and to better understand the real social networks these random networks are based on. To do this, the random networks must have similar attributes compared to real networks. For example, the simplest way to generate random non-weighted and non-directional connections between nodes is to use a binomial distribution to determine if a connection between the nodes should be made. As can be seen in Figure 2.3, the number of connections is increased based on the chance of a connection P . However, even if this is a simple solution for randomly generating links in a social network, it does not have the same attributes as a real social network. For example, the model used to generate the networks in Figure 2.3 does not produce networks which show attributes like clustering. The number of links that each node is connected to also do not match the realistic networks which have a certain distribution depending on the kind of social network.

2.3 Opinionated Nodes

Within the field of social networks, one particular field is the spread of information and beliefs. Here, each node in a network is assigned a belief which is in this thesis a vector of numbers represented as a point in a Euclidean space. These beliefs can be visualized in the form of a graph where each node is placed on the point in the Euclidean space where its beliefs lie. One of the ways to generate connections between nodes placed in such a belief space is using Random Geometric Models to generate Random Geometric Networks. In this case, the belief or opinion of every node is, when looking at a two-dimensional space, represented as a pair b = (x, y)

(12)

Figure 2.3: Random generated connections

with x and y being numbers. The components x and y both represent a belief of the node in a different dimension. This location in the belief space can then be visually represented as a point in a graph.

Random Geometric Networks (RGNs) Random geometric models (RGMs) are models in which the nodes in a social network are embedded in a metric space, typically Rn_{, and edges are generated based on the distance between nodes.}

Distance between nodes xiand xj is defined as d(xi, xj) where d can be any distance

function like Manhattan distance or Euclidean distance. We also take a distance threshold for the maximum distance between connected nodes t. These models address the problem mentioned before with generating social networks.

In the simplest version of an RGN graph we take a network N with a set of nodes x placed in a multi-dimensional social space Rn_{. The connection between}

any two nodes xi and xj are made if d(xi, xj) < b. Following this, the adjacency

matrix of the network can be filled in as follows: Ai,j =

(

1, if d(xi, xj) < b

0, otherwise

An important RGM and the model on which we will base our work on is the So-cial Distance Attachment (SDA) model. The SDA model generates the connections of a RGN based on a given degree of homophily and a maximum distance which is not a hard cut off point, but a indication of the distance at which the chance of a connection is 0.5. Given a network N , with a set of nodes {x0, x1, x2. . .}, a degree

of homophily α, and a characteristic distance b, The probability of a connection between node xi and xj is determined using Equation 2.1.

(13)

pi,j =

1 1 + [b−1_d(x

i, xj)]α

(2.1) As can be seen in Figure 2.4, the value of the characteristic distance b deter-mines the point where p(xi, xj) is at 0.5 and the value of α determines the rate at

which p(xi, xj) as the distance between nodes increases [3].

One interpretation of the geometric space where these RGNs lie is that the belief of a person can be represented as a multidimensional vector of numbers which is a point in the Euclidean space. This can thus be visualized as a graph where each node is visually represented by a point located on the position given by its belief vector. During this thesis, we assume that the Euclidean space where the networks is generated in is the belief space B = [0, 1] × [0, 1] with one exception to be presented later, very radical influencing.

Figure 2.4: Probability between connections in the SDA model

Learning networks It is desirable when making a model simulating the beliefs of people, to make the beliefs of nodes change over time just as is the case with humans. This can be done using the DeGroot model The DeGroot model achieves this by having each node in a network imitate the beliefs of its connected peers by, for each discrete time step the model is run, calculating the belief of a node to the weighted average belief of all connected nodes. This belief is represented as a single number. If we take a network N, a set of discrete time points {0, 1, ...}, and write bi(t) for the belief of agent i at time t, then the updated belief is:

bi(t + 1) = n

X

j=0

(14)

where Aij is the weight of the connection from node i to every node in the network.

In other words, the updated belief of a node is a weighted average of the beliefs of the node’s neighbors. Although the DeGroot model originally defined the belief as a single number, in this thesis we choose to represent it as a vector of numbers which is a generally accepted practice.

Stubborn agents Within social networks, there is one kind of agent which is referred to as a stubborn agent. This agent is characterized by the fact that while it can spread its beliefs to other nodes, it will not be influenced by other nodes itself. This causes this agent to have a large influence on the belief on which a network converges [12]. A stubborn node is a node which has no outgoing connections except to itself. So, when we take a networks N with a set of nodes {x0, x1, . . . , xn}, a node xi is stubborn if and only if Aij = 0 for every j where

(15)

Chapter 3 Implementation

In this chapter, we study the methods available to a stubborn agent whose purpose is to get the nodes in a social network to adopt an opinion as much as possible.

We will assume that the stubborn agent, who we call an influencer, does not have the power to decide which other nodes are influenced by it, rather, it can only broadcast a belief to the rest of the network. Based on this advertised opinion, the other nodes decide if they are influenced by this influencer using the SDA model. So the influencer can lie about its goals in order to get the attention of other nodes and in this way attract them towards its own actual goal opinion.

Our approach is divided into three sections:

1. Implement a model which generates a random geometric network and uses a learning model to update the beliefs of the nodes in the network.

2. Implement an agent which places a stubborn node in the network to influence the network towards a given location in the Euclidean space by moving the node towards this given location.

3. Evaluate different strategies of placing the stubborn node in the network. Our model is implemented in Python 3.6 using the Networx library for its implementation of networks, numpy for the mathematical formulas and sklearn for the clustering algorithm Kmeans clustering. Our code base is based on the existing code developed for Talaga and Nowak [14].

3.1 Generating the network

Our first aim is to be able to generate initial realistic social networks. This is done using the SDA model described in Section 2.3. This process is broken up into three steps:

(16)

1. Generate nodes and place them in a Euclidean space.

2. Generate connections using the Social Distance Attachment (SDA) model. 3. Implement a learning model to update the locations of all nodes in the

Eu-clidean space. This step is iterated on for each time interval.

Placing the nodes Firstly, a number of n nodes are generated. We will assume that the nodes are situated in the two-dimensional belief space B = [0, 1] × [0, 1], i.e., the belief of agent i is a vector bi = (xi, yi) with the constraint that 0 ≤ x, y ≤

1. The number of dimensions is set on two because nodes placed in a 2-dimensional space can be visualized using a grid. The nodes of a network are placed in a given number of clusters c; this is done to be similar to a real world, where groups of people have similar beliefs to each other. A cluster is generated by placing the center of the cluster randomly in a given Euclidean space within the belief space B and placing the nodes randomly around the center. An example of the generation of the nodes can be seen in Figure 3.1, where three clusters of nodes are generated. Two of these clusters are near each other and one of them is very isolated from the others.

Figure 3.1: An example of clustered node placement with the parameters: n = 20, c = 3, B = [0, 1] × [0, 1]

(17)

Connecting the nodes After the nodes are placed, they are connected using the SDA model as implemented in [14]. The SDA model was chosen to connect the nodes because people generally get their information from sources which have a similar belief to themselves [15] and the SDA model emulates this well. The SDA model is given the degree of homophily α and the average number of connections for each node k. The characteristic distance b of the model is determined using optimization techniques to get as close as possible to the average number of con-nections k. An example of the generation of these concon-nections is shown in Figure 3.2 where nodes generally have more connections with nodes which are close by and less with far away nodes.

Figure 3.2: An example of the SDA model generating connections between nodes with the following parameters: k = 3, α = 2

Updating the nodes After the nodes are generated with their initial location and connections, we start to update this information. This is done in two steps:

1. perform one iteration of the DeGroot model. In this step, the belief vector (or location in Euclidean space) of every node is updated, component-wise, using the DeGroot model as described in Section 2.2. For the purposes of this thesis, a slightly altered version of the DeGroot was used. The difference between the two approaches is that the resulting belief is normalized accord-ing to the number of neighbors a node has. This removes the necessity to

(18)

normalize the adjacency matrix each time connections are altered. Thus, if bi(t) = (xi(t), yi(t)) is the belief of agent i at time t, then the updated belief

is given in Equation 3.1: bi(t + 1) = Pn j=0Aijbj(t) Pn j=0Aij (3.1) where Aij is the weight of the connections between node i and j.

Although this is computationally more expensive, it is easier to add nodes and change connections within the network without updating the weights of all connections each time a connection is made or broken. An example of the network being updated by the DeGroot model is shown in Figure 3.3. 2. If the value d, indicating that the network is dynamic, is set to True, remove

all connections between the nodes and generate them again using the SDA model with the new locations. This will not happen if d is False. Then, the connections will stay the same each iteration.

3.2 Adding the influencer

After implementing the model for the generation and the updating of the network, we implement an agent who can influence this process. This is achieved in multiple steps: Firstly, the agent is given a goal g which is a vector or a location in the Euclidean space. Secondly, the agent determines a location to place an influencer node somewhere in the Euclidean space. The influencing node is a stubborn node who has a goal to influence the network towards and a advertised belief which is the belief that the other nodes belief the influencer node has. This influencer node represents a person (politician or media personality) or organisation (news network or advertisement agency) who wants to convince people in a society that a certain goal belief is true by strategically spreading misinformation, the advertised belief to the people in the society in a way which convinces as many people as possible. Small scale model To better understand the influencing of nodes in a social network, we first take this small scale example of one strategic stubborn node influencing a normal node.

Example 2. We take one node n which is randomly placed in the interval R = [0, 1] so that n ∈ R. A stubborn node s is also placed in the space R on position s = 0. This can be seen in Figure 3.4. This represents one person who beliefs a random

(19)

Figure 3.3: An example of the DeGroot model changing the beliefs of a network with each iteration

belief within a belief space and an influencer who has the goal of influencing the person into believing 0 by advertising the belief of 0.

The network changes as follows: at each time step i, a connection between node N and node S can be made based on Formula 2.1. Following this, the belief of each node is updated using the DeGroot model described in Chapter 2. In this case Formula 2.1 can be reduced to

(20)

Figure 3.4: Small scale model of a node being influenced by the influencer node

pn,s =

1 1 + [b−1_n]α

If this connection is made, the belief of node n during after iteration i will be as follows: if the connection between n and s has been made, the belief will stay the same since the only connection of node n is with itself (bi = bi−1). if the connection

between n and s exists the belief of node n will be: n(i + 1) = 1

2S + 1 2n(i)

If n makes a connection to s, then it gives equal weight to itself an the neighbor (1₂). Since s = 0 reduces to:

n(i + 1) = 1 2n(i)

Following this, the expected value of n can also be determined as seen in Figure 3.2.

E[ni+1] = (1 − pni,s)ni+ pn(i),s

1

2ni (3.2) It can be shown that given enough iterations, if pn,s 6= 0, that the position of

node n will eventually be equal to s.

This small scale example leads to two insights about what improves the ability to influence a node in a social network.

1. A higher probability of connection leads to a higher chance that the node is influenced by the influencer node. This probability is increased by decreasing the distance between the normal node and the influencer node.

(21)

2. The further the advertised belief of the influencer node is towards the goal of the influencer node, the more the normal node is influenced towards the goal if the connection is made.

These two factors lead to two goals in developing strategies in picking a starting position sp, which is a vector which represents a position in the Euclidean space where nodes of a networks are placed.

1. Due to the constraints of the SDA model, the agent will not be able to connect the influencer node with the regular nodes in the network. However the probability of making connections with other nodes increases the closer the influencer is to another node. Thus we want the agent to place the influencer node near as many nodes as possible. This leads to the influencer node being connected to as many nodes as possible. This is good for influencing networks since the more nodes the influencer node is connected to, the more nodes it can influence.

2. Placing the influencer node as close as possible to the goal. This causes nodes which connect to the influencer node to be further influenced towards the goal with each iteration of the DeGroot model.

For this thesis, we have looked at five possible strategies for the initial place-ment of the influencer node, presented below. The results of how these strategies perform in influencing social networks are discussed in Chapter 4

Start at goal The influencer node is placed on the spot where the agents wants the collective belief to converge. This strategy is solely based on goal 2 mentioned above. Although the influencer node is not placed intently near other nodes in the network and thus presumably makes fewer connections than strategies which are, the nodes which are connected to the influencer node will move more in the direction of the goal with each iteration. Intuitively this means that when an influencer wants to convince a group of people of a certain belief, it will advertise itself based on that belief and not change it advertised belief over time. Or more formally, given a goal g which is a vector representing a position in the Euclidean space in which the nodes of the network are placed, component u of the starting position sp is equal to this component u of goal g or:

spu = gu

Start at largest cluster Using this strategy clusters of nodes are determined using k-means clustering as implemented in the Python package sklearn. Then the

(22)

average position of the nodes in the largest cluster is chosen as the starting position. This strategy is focused on placing the node as close as possible to a subset of the network and thus making as many connections as possible. Intuitively this strategy would mean that if an influencer would want to convince people of a certain belief, it would find the largest group of people with similar beliefs and try to influence that group to be influenced by advertising a similar belief to this largest group. Over time, the influencer would try to move this large group towards its intended goal belief. Or, more formally, given a goal g which is a vector representing a position in the Euclidean space in which the nodes of the network are placed and a set of N nodes which are all part of one of the clusters in a set of clusters C. The largest cluster l is determined by:

l = max

i

X

c∈Ci

1

The component u of starting position sp is equal to the average component u of the belief i of the nodes or:

spu =

X

i∈l

iu

N

Start at mean belief The influencer node is given the average belief of all the nodes in the entire network when using this strategy. This strategy is also focused on being relatively nearby all other nodes and thus making as many connections as possible with the same reasoning as the start at largest cluster strategy. How-ever, the start at largest cluster strategy which is placed closer to a subsection of the nodes of the network while the start at mean belief strategy is less close but still relatively near all nodes of the network. Intuitively this means that when an influencer want to convince people of a certain belief, it will start by advertising itself based on the average belief of the group of people hoping that more people will listen to the influencer and over time move towards the intended goal belief. Or, more formally, the component u of starting position sp can be calculated for a network with a set of N nodes, with each node in this set A having a location in the Euclidean space which represents their belief.

spu =

X

i∈A

iu

N

Start at closest cluster Using this strategy clusters of nodes are determined using k-means. Then the average position of the nodes in the cluster closest to the coal is chosen as the starting position. This strategy is based on a combination

(23)

of both goals mentioned above by being placed close to nodes in a network, but also relatively closer to the goal of the agent. Intuitively this would mean that if an influencer wants to convince people to belief a certain thing, it would find the group of people who already belief something similar and advertise their belief to get this group to connect with the influencer. The influencer would then over time try to move this group towards the intended goal belief. Or, more formally, we say that using the start at closest cluster strategy, given a set of clusters C with their centers x being a location in the Euclidean space where the nodes are placed. The starting position sp is equal to the center which has the smallest distance to the goal or:

sp = min

x∈Ckx − gk

Start at closest node Here, the starting position is equal to the closest node in the social network. This strategy is also a combination of the two goals. However, this strategy is slightly more favored towards the second goal than the start at closest cluster strategy by being placed slightly closer to the goal of the agent. Practically this would mean that if an influencer wanted to convince people to belief a certain thing. It would find the person with beliefs which are the most similar to the goal belief and adopt and advertise that belief. Then, over time, the influencer would move this advertised belief towards the intended goal belief. More formally, the component u starting position sp given a network which contains a set of nodes A which contain the position of each node x in an Euclidean space which represents its beliefs, the starting position is equal to the location of the node which has the lowest distance with the goal g.

spu = min

x∈Akxu− guk

The strategies which involve changing the advertised belief over time do so in a linear way towards the goal belief with each iteration. The advertised belief of an influence node a, which is a position in a two-dimensional Euclidean space can be calculated for each iteration number i given a starting position sp in the Euclidean space and a goal g in the same Euclidean space and the total number of iterations performed t as follows:

a(i) = g − sp t i + sp

Where i is the number of the current iteration in the DeGroot model and N is the total number of iteration that are performed.

A summary of our notations of different parameters of the model is given in Table 3.1.

(24)

Model parameter Annotation Amount of nodes in a network n

Number of clusters c Amount of DeGroot Iterations t Average number of connections per node k

Dynamic d

Amount of experiments per data point e

Goal g

Belief space B Degree of homophily α Starting position of the influencer node sp

Table 3.1: Model properties and their annotation

3.3 Evaluating the strategies

To evaluate the effectiveness of strategies, we compare them by how much they are able to move the average belief of a network. The average belief of a network b is cacluated by calculating the average value of each component u of belief for each node in a network. For a network N with a set of nodes C containing n nodes this is done as follows: bu = X i iu n

This average belief of a network is calculated after t iterations of the DeGroot algorithm and is done in the following way: If b = (b1, b2) is the average belief of

the network and g = (g1, g2) is the goal, then s score s can be calculated for every

strategy based on the Euclidean distance between g and s, as follows: 3.3. s = v u u t 2 X i=0 (bi− gi)2 (3.3)

These results are compared to simulations of the same node positions, but where an influencer node was not placed. The different strategies are also be compared to each other using the same metric. This experiment is performed multiple times to take the probabilistic nature of the generation of the networks and connections into account. Moreover, this experiment is performed on networks with different properties, like networks with a low or high degree of homophily, different sizes of networks and if the connections in the network are dynamic or not.

The strategies are also tested in cases where the goal of the agent is to have the network converge on a position which is not very radical, somewhere within

(25)

the space the nodes are made. However, the strategies are also evaluated when the goal is exactly of the edge of the space where the nodes of the network are placed and when the goal is well outside where the nodes are placed.

(26)

Chapter 4 Results

In this chapter we present the results of the evaluation of the strategies discussed in Section 3.2 are shown and discussed. These results are split up in three distinct sections based on the goal of the influencing agent. The scores of the strategies are both plotted against the degree of homophily in the network and against the size of the network.

1. A network consisting of n = 20 nodes is generated. Each node is placed in a Euclidean space within the belief space B.

2. An influencer node is placed in copies of this network using each strategy described in Section 3.2.

3. All the nodes in the network including the influencer node are connected to each other using the SDA model. Each node has an average of k = 4 connections. This is done for varying degrees of homophily α.

4. t = 10 iterations of the DeGroot model are performed. Each iteration, if d = T rue, the connections are again generated using the SDA model and the influencer node changes its advertised belief towards the goal g if necessary. 5. The score is calculated using the evaluation metric described in Section 3.3 This is done a specific number of times e and the average score is plotted against the degree of homophily. The second kind of simulations is almost the same. However, the degree of homophily α is static and the average score is plotted against a varying number of nodes in the network n.

Firstly, the evaluation of non-radical influencing is shown. In this section, the goal of the agent is placed in the same subspace as the placed nodes. Secondly, radical influencing is evaluated. Radical influencing is when the goal of the influ-encing agent is placed on the edges of the subspace where all regular nodes of the

(27)

network are placed. Thirdly, the results of very radical influencing are shown and discussed. Very radical influencing is when the goal of the influencing agent is well outside of the subspace where all regular nodes are placed.

4.1 Non-radical influencing

With non-radical influencing, the goal of the agent is to influence the network towards a goal which is a random point in the range of where all regular nodes in the network are placed. In this case where the boundaries for all node placement B is [0, 1] × [0, 1] and g = (x, y), 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1 are true. Intuitively, this is a case where an influencer wants to influence people towards a belief that is fairly moderate and within the ranges of civil discourse. All strategies of influencing the network, as can be expected, score lower (and thus better) than not placing any influencer node in the network. This is the case for any network. When evaluating

Figure 4.1: Scores of the different strategies using the following parameters: n = 20, t = 20, k = 3, d = F alse, e = 100, g = (x, y) where 0 ≤ x, y ≤ 1, B = [0, 1] × [0, 1]

strategies used with networks of varying degrees of homophily for non-dynamic networks where d = F alse, different strategies outperform each other in based on the degree of homophily in the network. When the homophily in generated networks is high, which means that nodes which are closer in belief to one another have a higher chance of connecting compared to nodes which are far away from

(28)

each other, the start at largest center strategy outperforms all other strategies. However, when the homophily is low and distance between nodes doesn’t play as much of a factor when making connections between nodes, the start at goal, the start at closest node, and the start at closest cluster are the best performing strategies with the start at goal strategy performing the best. The start at mean strategy seems to be one of the worst strategies in terms of performance for all degrees of homophily. This is illustrated in Figure 4.1.

Figure 4.2: Scores of the different strategies using the following parameters: n = 20, t = 20, k = 3, d = T rue, e = 100, g = (x, y) where 0 ≤ x, y ≤ 1, B = [0, 1] × [0, 1]

Similar results can be observed when evaluating dynamic networks where d = T rue. When the degree of homophily in a network is low, the start at largest center and the start at mean strategy perform significantly worse than the start at goal , start at closest node and start at closest center strategies. However, as the degree of homophily increases, the difference in score between the different strategies become smaller and almost insignificant. This can be seen in Figure 4.2.

Which strategy performs better is not depended on the size of the network. Although a larger networks are influenced less by a single influencer node, this is equal for all strategies as can be seen in Figures 4.3.

(29)

Figure 4.3: Scores of the different strategies using the following parameters: α = ∞, t = 20, k = 3, d = F alse, e = 10, g = (x, y) where 0 ≤ x, y ≤ 1, B = [0, 1] × [0, 1]

4.2 Radical influencing

With radical influencing, the goal of the agent is to influence the nodes in a net-work towards one of the corners of the range where the regular nodes of the network are placed. Because the belief space where the nodes are placed is B = [0, 1] × [0, 1], the goal is randomly chosen to be in one of the corners, i.e., g ∈ {(0, 0), (0, 1), (1, 0), (1, 1)}.

The results of these simulations are almost exactly the same as with the non-radical influencing in terms of which strategies perform the best. The only dif-ference is that all strategies and the baseline perform equally worse. The cause of this is that the goal in this case of radical influencing is generally further away from the center of the range where the nodes are placed and where the network, on average converses if nothing is influencing the nodes. This can be seen in Figures 4.4-4.6

(30)

Figure 4.4: Scores of the different strategies using the following parameters: n = 20, t = 20, k = 3, d = F alse, e = 100, g ∈ {(0, 0), (0, 1), (1, 0), (1, 1)}, B = [0, 1] × [0, 1]

4.3 Very radical influencing

Lastly, we want to simulate a case where the network is influenced towards a very radical belief. This is done by placing the node very far away from the belief space B where all regular nodes are initially placed. This large distance between the goal and the range in which nodes are placed is set at 20. This results in the goal being set at g ∈ {(−20, −20), (−20, 20), (20, −20), (20, 20)}. Another way in which this could have been done is limiting the belief space B where the regular nodes are generated. Then, the influencer node could be placed within the normal constraints of the belief space. However, we believe that keeping the constraints of placing the regular nodes the same as in all cases makes is easier to compare the results of very radical influencing with the other forms of influencing.

When the network is not dynamic d = F alse and the homophily is low, the start at goal strategy outperforms every other strategy by a very significant amount. However, the other strategies: start at largest center , start at closest node , start at closest node and start at mean perform worse and very similar due to the large distance between the starting location and the goal. However, as the homophily increases, start at goal soon becomes the worst per-forming strategy followed by start at mean , then start at closest node and start at closest center which perform very similar, and lastly start at largest

(31)

Figure 4.5: Scores of the different strategies using the following parameters: n = 20, t = 20, k = 3, d = T rue, e = 100, g ∈ {(0, 0), (0, 1), (1, 0), (1, 1)}, B = [0, 1] × [0, 1]

center which performs the best when non-dynamic networks have a high degree of homophily. This can all be seen in Figure 4.7

4.4 Discussion of the results

Performance of the start at goal strategy Recall from Sections 4.1-4.3 that the start at goal strategy outperforms all other strategies when the homophily of a network is very low. The suspected reason for this low score is that when networks have no homophily, the distance between nodes is no longer a factor and that all nodes have an equal chance of connecting with each other. Thus we can derive the change of a node’s belief from the equation in Figure 3.2 as shown in Figure 4.13.

This leads to the conclusion that the strategy which places its influencer node closest to the goal will perform better than strategies will place their influencer node between the goal and the nodes in a network. This would explain the good performance of the start at goal strategy when the degree of homophily is low. This would also mean that the closest node and closest center strategy would also score better when the homophily is very low which is corroborated by Figure 4.4 and 4.1. However, this can not be seen in Figure 4.8 which could be explained

(32)

Figure 4.6: Scores of the different strategies using the following parameters: α = ∞, t = 20, k = 3, d = F alse, e = 100, g ∈ {(0, 0), (0, 1), (1, 0), (1, 1)}, B = [0, 1] × [0, 1]

by the fact that the scores of all strategies are very high in general. Therefore, small differences in score can be hard to see.

In contrast, as the degree of homophily increases, the start at goal strategy starts performing worse that all other strategies as seen in Figure 4.7 and 4.4. The explanation for this is that the due to the large distance between the influencer nodes and the other nodes in the network causes the probability of connections gets very low thus decreasing the ability for the influencer node to influence the network.

The trade-off between more connections and stronger influence Due to the nature of the SDA model, there is a inherent trade-off between having the influencer node advertise a belief close to other nodes to create more connections with regular nodes, and having the influencer node advertising a belief near the goal to create a stronger influence for each made connection. In the results, the difference in performance between the two goals described in Section 3.3.1 is can be observed in the non-radical and radical influencing of both non dynamic and dynamic networks seen in Figure 4.1 and 4.4. Where the Largest center and the Average strategies both have a similar score while the homophily of the network is low. and both perform worse than the other strategies. This is due to the fact

(33)

Figure 4.7: Scores of the different strategies with influencing networks generated using the following parameters: n = 20, t = 20, k = 4, d = F alse, e = 100, g ∈ {(−20, −20), (−20, 20), (20, −20), (20, 20)}, B = [0, 1] × [0, 1]

that the starting location of the influencer nodes placed by the Largest center and the Average strategies is too far from the goal. This causes the other nodes to be influenced towards the goal less even if they are directly connected to the influencer node.

However, when the homophily of the generated networks increases, the largest center strategy starts to outperform the average strategy in all non-dynamic networks. This is caused by the fact that the largest center strategy places nodes very nearby a select number of nodes while average places the node not far away, but also not purposefully close to any group of nodes.

Influential parameters The degree of homophily in a network has a significant influence on the performance of all strategies. It generally causes strategies to per-form worse, the higher the homophily of a network is. The cause of this is that the static strategy causes a larger distance between the influencer node and the other nodes in the network. The higher homophily decreases the chances of connections being made between far away nodes and thus, there are fewer connections made between the influencer node and the other nodes. This in turn causes the less influential influencer node.

(34)

Figure 4.8: Scores of the different strategies with influencing networks generated using the following parameters:

n = 20, t = 20, k = 4, d = T rue, e = 100, g ∈ {(−20, −20), (−20, 20), (20, −20), (20, 20)}, B = [0, 1] × [0, 1]

moved over time with dynamic networks. Because when the influencer node moves towards the goal, the influencer node can move to fast. This causes more distance between the influencer node and the rest of the network over time and in turn, a lower chance of connections between the influencer node and other nodes. This leads to a lower score. This can bet observed in Appendix C with the different strategies.

The size of the influenced networks have a significant influence on the scores of the strategies in all cases. Namely, the score of the strategies get worse, the larger the network is. The cause of that is since there are more nodes and all connections are made equal, it becomes harder for a single influencer node to influence the entire network. However, the size does not seem to have any significant influence towards which strategy performs better than any other strategy.

(35)

α = ∞, t = 20, k = 4, d = F alse, e = 100, g ∈ {(−20, −20), (−20, 20), (20, −20), (20, 20)}, B = [0, 1] × [0, 1]

(36)

α = 0, t = 20, k = 4, d = F alse, e = 100, g ∈ {(−20, −20), (−20, 20), (20, −20), (20, 20)}, B = [0, 1] × [0, 1]

(37)

α = 0, t = 20, k = 4, d = T rue, e = 100, g ∈ {(−20, −20), (−20, 20), (20, −20), (20, 20)}, B = [0, 1] × [0, 1]

(38)

α = ∞, t = 20, k = 4, d = T rue, e = 10, g ∈ {(−20, −20), (−20, 20), (20, −20), (20, 20)}, B = [0, 1] × [0, 1]

ni+1=

1

2ni+ si

(39)

Chapter 5 Conclusion

In conclusion, this thesis has examined how the spreading of information in ran-dom geometric social networks can be influenced by placing a strategic stubborn influencer node inside the network. To do this, networks were generated by plac-ing nodes in clusters in a 2-dimensional space and makplac-ing connections between them using the Social Distance Attachment (SDA) model. From this, the De-Groot model was used to simulate the spreading of beliefs through the network. Then, different strategies of placing and moving the influencer node through the network towards a given goal were developed.

The strategies were developed based on two things that increase the influencer node’s influence: having as much connections with other nodes as possible, and being placed as close to the goal as possible. All strategies were given a score based on the distance between the given goal and the average belief of the network after a given amount of iterations. The scores of these strategies were compared to each other and to a baseline strategy where networks weren’t influenced at all.

The comparison was done for three different scenarios, based on the position of the influencer node: in the non-radical case, the goal belief of the influencer was within the range where the nodes of the network were placed; in the radical case, the goal belief was placed at the corners of the range were regular nodes were placed and the very-radical case, where the goal belief was placed well outside of the range where regular nodes were placed. This was also done for networks of different sizes, degrees of homophily and if the connections are dynamic or not.

Overall, the degree of homophily in a networks and whether the network is dynamic or not determine which strategy performs the best. If the degree of homophily in a network is low, the strategies which are focused on starting out close to the goal perform better than strategies where the influencer node is mainly places in more populated areas of the belief space. But as the homophily increases, the strategies which focus on placing the influencer node near as many as possible

(40)

other nodes start to perform as good or better than the other strategies. The size of the networks on the other hand does not determine which strategy performs the best although it does have a influence of the scores of all strategies.

5.1 Further research

In this thesis, it is assumed that every dimension in the social belief space are equally susceptible to change. Naturally, this is not the case. While opinions about the world might be quite susceptible to change, other very critical factors in making connections with other people like gender identity or ethnicity are not. This is currently not accounted for in the model used and might give very different results for each tested strategy.

Another factor that is not accounted for in the simulations performed in this thesis is different weight factors of the different dimension of the social subspace. Currently, every dimension is given equal weight while this might not be totally representative of reality.

Another thing that has not been accounted for is the difference of weights of the connections between nodes. In this thesis, it is assumed that they are all equal. However, the connections might in reality be stronger or weaker based on the social distance between nodes. This could result in higher or lower weights given to connections based on distance. More research in the weight of connections in the flow of information of social networks is thus recommended.

Lastly, a recommendation for future research is also to develop more dynamic influencing strategies where the movement of the influencer node is based on the beliefs of the other nodes in the network. This could be achieved with machine learning techniques like reinforcement learning.

(41)

Bibliography

[1] Daron Acemoglu, Munther A Dahleh, Ilan Lobel, and Asuman Ozdaglar. Bayesian learning in social networks. The Review of Economic Studies, 78(4): 1201–1236, 2011.

[2] Abhijit Banerjee, Arun G Chandrasekhar, Esther Duflo, and Matthew O Jack-son. The diffusion of microfinance. Science, 341(6144), 2013.

[3] Marián Boguná, Romualdo Pastor-Satorras, Albert Díaz-Guilera, and Alex Arenas. Models of social networks based on social distance attachment. Phys-ical review E, 70(5):056122, 2004.

[4] Emanuele Corbellini. Fake news: roots, dangers, and possible solutions. 2019. [5] Sergio Currarini, Matthew O Jackson, and Paolo Pin. An economic model of friendship: Homophily, minorities, and segregation. Econometrica, 77(4): 1003–1045, 2009.

[6] Morris H DeGroot. Reaching a consensus. Journal of the American Statistical Association, 69(345):118–121, 1974.

[7] Igor Douven and Rainer Hegselmann. Mis-and disinformation in a bounded confidence model. Artificial Intelligence, 291:103415, 2020.

[8] Benjamin Golub and Matthew O Jackson. Naive learning in social networks and the wisdom of crowds. American Economic Journal: Microeconomics, 2 (1):112–49, 2010.

[9] Rainer Hegselmann, Ulrich Krause, et al. Opinion dynamics and bounded confidence models, analysis, and simulation. Journal of artificial societies and social simulation, 5(3), 2002.

[10] Matthew O Jackson. Social and economic networks. Princeton university press, 2010.

(42)

[11] Miller McPherson, Lynn Smith-Lovin, and James M Cook. Birds of a feather: Homophily in social networks. Annual review of sociology, 27(1):415–444, 2001.

[12] Mauro Mobilia. Does a single zealot affect an infinite group of voters? Physical review letters, 91(2):028701, 2003.

[13] Mathew Penrose et al. Random geometric graphs, volume 5. Oxford university press, 2003.

[14] Szymon Talaga and Andrzej Nowak. Homophily as a process generating social networks: insights from social distance attachment model. arXiv preprint arXiv:1907.07055, 2019.

[15] Toni GLA Van der Meer, Michael Hameleers, and Anne C Kroon. Crafting our own biased media diets: The effects of confirmation, source, and negativity bias on selective attendance to online news. Mass Communication and Society, 23(6):937–967, 2020.

(43)

Chapter 6 Appendices

6.1 Appendix A

Example of a social network generated using the SDA model which learns using the DeGroot model.

(44)

The network in figure 6.1 is generated using the following parameters: n = 5, t = 20k = 2, α = ∞

(45)

6.2 Appendix C

6.2.1 Average

Figure 6.2: Example of a network being influenced by the start strategy start at average and the move linearly strategy.

(46)

6.2.2 Start at goal

Figure 6.3: Example of a network being influenced by the start strategy start at goal and the move linearly strategy.

(47)

6.2.3 Closest node

Figure 6.4: Example of a network being influenced by the start strategy Closest node and the move linearly strategy.

(48)

6.2.4 Largest center

Figure 6.5: Example of a network being influenced by the start strategy start at largest center and the move linearly strategy.

Influencing Social Networks Using Strategic Stubborn Nodes

Influencing Social Networks

Using Strategic Stubborn

Nodes

Influencing Social Networks Using

Strategic Stubborn Nodes

Contents

Chapter 1

Introduction

Chapter 2

Theoretical background

2.1

Social networks

2.2

Generating Social networks

2.3

Opinionated Nodes

Chapter 3

Implementation

3.1

Generating the network

3.2

Adding the influencer

3.3

Evaluating the strategies

Chapter 4

Results

4.1

Non-radical influencing

4.2

Radical influencing

4.3

Very radical influencing

4.4

Discussion of the results

Chapter 5

Conclusion

5.1

Further research

Bibliography

Chapter 6

Appendices

6.1

Appendix A

6.2

Appendix C

6.2.1

Average

6.2.2

Start at goal

6.2.3

Closest node

6.2.4

Largest center