An analytical approach to information dissipation

(1)

An analytical approach to information dissipation.

Kees Til

August 29, 2018

MSc Thesis Supervisor: dr. R.Quax

Instituut voor Informatica (IvI)

(2)

Abstract

In this thesis a new measure called the information dissipation time (IDT) will be in-troduced. The IDT measures the time it takes for the information that a unit shares with the system to reach some small threshold . A big problem of this measure is that for big systems, Monte Carlo sampling is needed which is computationally expensive. An alternative method would be to investigate the travel distance of information. The distance (d) that the information of a node n can travel before it disappears or reaches some small threshold is a measure of the extent that n influences the global system state. If the distance is low the effects remain local and if the distance is high its effects influences the global system state. We call this measure the information dissipation length (IDL).

We will investigate networks based on the kinetic Ising model, where the Gibbs mea-sure is used with the Hamiltonian function as energy function. The mutual information of different kind of directed networks will be analyzed, such as a simple path graph and a Cayley graph. After an in depth analysis we conclude that the mutual information between x0andXd= {x ∈ S | distance(x, x0) = d} is bounded, given that the source node

x₀has a (1₂,1₂) distribution and that every node in the system has one incoming edge:

FFork(n(d), β, d) ≤ I(x0;Xd) ≤ FStar path(n(d), β, d)

With these constraints we see that in general, the high degree units seem to have the highest information dissipation length. This implies that the most influential nodes in the network are the ones with the highest degree. However, for high enough tem-peratures, the topological structure of the network does not matter since the mutual information between x0andXdwill follow the same powerlaw for all networks.

Title: An analytical approach to information dissipation. Author: Kees Til, kees.til@student.uva.nl, 10385827 Supervisor: dr. R.Quax

Assessors: dr. V. Krzhizhanovskayadr and dr. M.Lees Date: August 29, 2018

Instituut voor Informatica (IvI) Universiteit van Amsterdam

Science Park 904, 1098 XH Amsterdam

http://gss.uva.nl/content/masters/computational-science/computational-science. html

(3)

I would like to thank my supervisor Rick Quax for introducing me to the subject and granting me a generous amount of his time. Answering most of my questions and be-ing very patient has helped this thesis take its present form. Furthermore i would like to thank my assesors Valeria Krzhizhanovskaya and Mike Lees for taking the time to evaluate my thesis.

For the reader:

In this thesis my research about information dissipation will be presented. In appendix C and D one can read my literature research. This literature research is about the no-tion of causality and the measures that have been used to quantify causal power of units within the network, for which the IDT and IDL are eventually intended to be used.

(5)

1. Introduction

When researching complex networks, the main goal is to understand how individual behavior of the units affect the system as a whole. What node has the most influence on the network and what kind of properties must it have to influence this network? To quantify the influence of a node on the system as a whole we introduce the mutual information. Mutual information is one of many quantities that measures how much one random variables tells us about another, it calculates the information of the system state that is explained by an individual unit. As time progresses, the information of St that is explained by unit n becomes smaller. How long does it take for the information of n shared with the system St to reach some small threshold or even completely dis-appear?

To understand the way information flows through a system, a measure called the infor-mation dissipation time (IDT) will be introduced. This measure measures the amount of time (t) it takes for the information of some node (n) on t = t0to reach some small

threshold or even disappear from the system (St). If we find the node n that has the highest IDT, we know that the mutual information of this node with the system will take the longest to reach or 0. This means that the impact of n on the trajectory of the system is the longest.

To calculate the IDT, one must measure the mutual information between n and Stover time. A common method to do this is by simulating the system with Monte Carlo sam-pling and then construct the probability distributions needed to calculate the mutual information. The problem with this method is that for big enough systems, Monte Carlo sampling is computationally expensive. For example, suppose one has a system of 100 units where each unit can have the state 1 or −1. This means that the total amount of possible system states is equal to 2100, thus constructing a probability distribution for 2100possible states would be needed, which is computationally expensive using Monte Carlo sampling. To solve this problem, one can try to construct a closed form formula for the mutual information.

An alternative method is to consider the mutual information between a unit x0 and

the units on distance d from x₀ (Xd). With this calculation we are able to see how far information of x0 can travel before it disappears or reaches some small threshold

. This is a measure of the extent that n influences the global system state: If it is low

then its effects remain local, whereas if it is high it may lead to a system-wide change of state. We name this distance the information dissipation length (IDL).

(6)

2. Information dissipation time and

Information dissipation length

As [8] describes: “The main goal of complex systems research is to understand how the dynamics of individual units combine to produce the behavior of the system as a whole.”

A common method to do this is by removing or adjusting a unit from the network and see what effect this has on the systems behavior. This is a measure of how robust the system is to a perturbation. However, if the perturbation is not part of the natu-ral behavior of the system, conclusion drawn after the research can be false since the perturbed system is not a representative model of the original system. R. Quax [8] il-lustrates this by discovering that highly connected ferromagnetic spins hardly explain the observed dynamical behavior of a system, even though removing such a spin would have a large impact on the average magnetization, stability, and critical temperature. As a measure of the dynamical importance of a unit s, a calculation called the infor-mation dissipation time (IDT) can be made. This measure calculates the time it takes for the shared information of a unit s and the system state S to reach 0 or some small threshold .

2.0.1. Mathematical terminology

For the sake of simplicity we assume that our system S is an undirected graph. A system S contains states s1, s2, . . . , sn, where some states interact with each other by

edges E = {(si, sj)|siconnected with an edge to sj}. The number of interactions si has

with other states, is called the degree of s_i and is denoted by k_i. The set where s_i in-teracts with is called hi= {x|(si, x) ∈ E}. The state of unit si at time t is denoted by sti.

The collection St = st₁, . . . , st_n forms the system state at time step t. The unit st_i chooses its next state x by p(s_it+1= x|h_i), this is called a markov network.

Using appendix A, one can calculate the bits of information of a system state St by calculating its entropy H(St). To quantify the information shared between a unit at a certain time t₀(st0

i ) and the systems state (St), one needs to calculate the mutual

infor-mation I(st0

i ; St). If I(s t0

i ; St) = 0, the systems state is independent of the unit, meaning

that the unit st0

i has no influence on St.

As time progresses, I(st0

i ; St) is the information s t0

(7)

mea-sures the dynamical importance of a single unit to the total state space, since it quan-tifies how st0

i influences the trajectory of St. The longer it takes for I(s t0

i ; St) to reach

zero, the longer st0

i influences the system thus the longer its impact on the trajectory.

The time it takes for I(st0

i ; St) to reach some small threshold is what we call the

infor-mation dissipation time (IDT).

Now consider the distance that information of some unit x0can travel within the

net-work before its information is lost or reaches some small threshold . We call this the information dissipation length (IDL). In this thesis we will mainly focus on this mea-sure by calculating the mutual information of different networks between some unit x0

and the set of nodes on distance d from x0(Xd). In the next section we show with some

calculus how one calculates the IDL for simple paths. This calculation is a reproduction of the work in [8] on page 18.

2.0.2. IDL for a path of single nodes Suppose one has a network that follows a path:

s1→s2→s3→. . .

and p(si+1|si) = p. Furthermore we consider s1to have the probability distribution (12,12).

Since all nodes have equivalent dynamics we expect eventually an exponential decay of information from the first node, i.e., I(s1; si) ≈ e−αi. However in the first few steps the

decay may deviate from an exponential decay because the initial node is distributed (1₂,1₂) which is different from the equilibrium distribution of s_i for sufficiently large i. We therefore calculate the rate of decay which is eventually obtained as i → ∞.

For the purpose of calculating the mutual information, note that p(s₁|_s

i) = p(s1 = si).

Furthermore we see that p(s1 = si) is equal to the chance of an even amount of

mis-matches (sj−1, sj). This statement is equivalent to saying that the binomial distribution

has a chance of an even amount of mismatches:

p(mismatches even) = p(s₁= s_i) (2.1) = bi−1 2 c X k=0 i − 1 2k ! pi−1−2k(1 − p)2k (2.2) = 1 + (2p − 1) i−1 2 (2.3)

Where the last equation follows from the binomial theorem, note that we take i −1 since we start on index 1. The following example shows us the calculation from 2.2 to 2.3:

(8)

Using the binomial theorem: (x + y)n= n X k=0 n k ! xkyn−k We see that: ((1 − p) + p)i−1= n X k=0 i − 1 k ! pk(1 − p)i−1−k = bi−1 2 c X k=0 i − 1 2k ! p2k(1 − p)i−1−2k+ bi−1 2 c X k=0 i − 1 2k + 1 ! p2k+1(1 − p)i−1−(2k+1) = P {X even} + P {X odd} and ((1 − p) − p)i−1= i−1 X k=0 i − 1 k ! (−p)k(1 − p)i−1−k = bi−1 2 c X k=0 i − 1 2k ! p2k(1 − p)i−1−2k− bi−1 2 c X k=0 i − 1 2k + 1 ! p2k+1(1 − p)i−1−(2k+1) = P {X even} − P {X odd}

Summing them together and then divide by 2 gives us the following result:

P {X even} = 1 2(1 + (1 − 2p) i−1_{) =}1 2+ 1 2(1 − 2p) i−1

Now we see that lim

i→∞Ri= limi→∞

I(s1; si) I(s1; si+1) (2.4) = lim i→∞ H(s1) − H(s1|si) H(s₁) − H(s₁|_s_i+1 (2.5) = lim i→∞ 1 +q+q_2qilog(q+q_2qi) +1 −q+q_2qilog(1 −q+q_2qi) 1 +q+q_2qi+1log(q+q_2qi+1) +1 −q+q_2qi+1log(1 −q+q_2qi+1)

(2.6) = lim i→∞ 1 2qi−1log(q) log₂qi_2q+q−_log 2 1 −q_2qi+q 1 2qilog(q) logqi_{+ 1}−_log 2 1 − qi₎ (2.7)

(9)

= lim i→∞ log₂q_2qi+q−_log 2 1 −q_2qi+q q log₂qi_{+ 1}−_log 2 1 − qi₎ (2.8) = lim i→∞ qx−1_log 2(q) 1+qx−1 + qx−1_log 2(q) 1−qx−1 qx+1_log 2(q) 1+qx + qx+1_log 2(q) 1−qx (2.9) = lim i→∞ 1 − q2i−2 1 − q2i (2.10) = 1 q2 (2.11) = 1 (2p − 1)2 (2.12)

Note that if q ∈ (0, 1), that limi→∞I(s1; si) = limi→∞I(s1; si+1) = 0. Thus in 2.6 and 2.7

we used l’Hˆospitals rule.

In step 2.11 we conclude thatRi has a constant rate f = _(2p−1)1 2. Note that this is the relative IDL, which is clearly different than the absolute IDL. The relative IDL for some

φ can now be defined as:

rel IDL(s1) = log1

f(φ)

Suppose we choose φ =1₂, p =3₄, then we see that:

rel IDL(s1) = log.25(φ) =

1 2

Meaning that s1lost half of its information if it traveled a length of 12. Finally we

illus-trate how the IDL of the system of coins depends on the copy probability p on the next page.

(10)

Figure 2.1.: The information dissipation length (IDL) of a sequence of conditional coin flips, where each coin flip has the same outcome as the previous coin flip with probability p and =1₂. As p → 0 or p → 1 the IDL diverges to infinity; as p → 1₂the IDL goes to zero.

The next step would be to calculate the IDL for other directed networks. If we are able to derive the mutual information of x0with all nodes in layer d:

Xd_{= {x ∈ S | distance(x, x}

0) = d}

We can find the absolute IDL by finding the d wherefor:

I(x0;Xd) =

With the help of this calculation, we can find the x0 with the highest IDL value and

(11)

3. Research

For my research I will focus on the calculation of the IDL by calculating the mutual information between a unit x0and the nodes in the setXd= {x ∈ S | distance(x0, x) = d}.

The networks will be based on the kinetic Ising model with the Gibbs measure:

P (xt+1_j = r|xt_i) = 1 Z(β)e −_βE[xt+1 j =r|xit]_, _{P (x}t+1 j = r) = 1 Z(β)e −_βE[xt+1 j =r]

Where E(x) is the energy of the configuration x. In the kinetic Ising model, the energy of a configuration σ is defined by the Hamiltonian function, where σ_idenotes the spin configuration of site i. The sum of all possible pairs is counted only once:

E(σ ) = −X h_{i ji} Jijσiσj−µ X j hjσj

For simplicity we will assume that µ = 0 and J_ij= 1. This research is important due to the fact that it shows which nodes are the most influential in an analytical way, which results in a much lower computation time than when one does this with the help of Monte Carlo sampling. My research question can be formulated in the following way:

For a network based on the kinetic Ising model: Can we derive a closed form expres-sion of the mutual information between x0 and Xd? Furthermore, can we see what

properties a unit within the system must have to take on a high IDL value?

To answer this question, the following sub-questions must be answered:

• Can we construct an analytical expression of the mutual information between some source node x0and the nodes on distance d (Xd) for different kind of

net-works? Can we find bounds of the mutual information between x0andXd?

• How can we compare different kind of networks, does the mutual information have universal properties?

• Can we construct an approximation for the mutual information that comes close enough to the analytical formula?

(12)

3.1. Calculating mutual information of di

fferent networks

In this section we mainly focus on the calculation of the mutual information between some source node x0 with the nodes on distance d from x0 (Xd). For the following

networks we will calculate the mutual information: • Star graph

• Star path graph • Fork

• Multiple forks • Cayley graph 3.1.1. Star graph

A star graph G = (V , E) is a graph with some source node x0that has n outgoing edges

to other nodes x_1n. We concentrate first on nodes with distance d = 1, thus all x_1n do not have any outgoing edges. An illustration of a star graph can be seen below:

x0

x11

x12

x13 x14

Remember that we have defined the probabilities with the Gibbs measure:

P (xt+1_j = r|xt_i) = 1 Z(β)e −_βE[xt+1 j =r|xit]_, _{P (x}t+1 j = r) = 1 Z(β)e −_βE[xt+1 j =r]

Where E(x) is the energy of the configuration x. The energy of a configuration σ is defined by the Hamiltonian function, where σi denotes the spin configuration of site i.

The sum of all possible pairs is counted only once:

E(σ ) = −X h_{i ji} Jijσiσj−µ X j hjσj

For simplicity, assume that µ = 0 and Jij = 1. Furthermore note that a star graph has a

few properties:

• Since all nodes have one source x0, we see that p(x1. . . xn|x0) = Πp(xi|x0).

• Since all nodes only have x0 as incoming neighbour, p(x1|x0) = p(x2|x0) = · · · =

(13)

• The first node has no incoming edges, hence its distribution is (1₂,1₂). This means that in the end the conclusions drawn only apply to nodes with a (1₂,1₂) distribu-tion. This property is desirable since we use it at equation 3.1.

With some calculus, the mutual information of the source x0 and the other nodes

(x1, x2, . . . , xn) can be calculated: I(x0; x1, . . . xn) = X x0,x1,...xn p(x0, x1, . . . xn) log p(x0, x1, . . . xn) p(x₀)p(x₁, . . . x_n) ! , = X x0,x1,...xn p(x0, x1, . . . xn)      log(p(x1, . . . xn |_x₀_{)) − log(p(x}₀_{, x}₁_{, . . . x}_n₎₎       = X x0,x1,...xn p(x0, x1, . . . xn)      

log(Πn_i=1p(xi|x0)) − log(

X x0 p(x1, . . . xn|x0)p(x0))       = X x0,x1,...xn p(x0)p(x1, . . . xn|x0)       n X i=1 log(p(xi|x0)) − log( X x0 Πn_i=1p(xi|x0)p(x0))       = X x0,x1,...xn p(x0)p(x1, . . . xn|x0)       n X i=1 log(p(xi|x0)) −_log       Πn_i=1p(x_i|_x₀_{= 1)p(x}₀_{= 1) + Π}n i=1p(xi|x0= −1)p(x0= −1)            

x0∈ {−1, 1} we can use the following substitution:

X x0,x1...xn p(x₀)p(x₁. . . x_n|_x 0) = X x0 p(x₀) X x1...xn p(x₁. . . x_n|_x 0) = X {−_1,1} p(x₀) n X k=0 n k ! qk_a(1 − q_a)n−k Where qa= p(xi= 1|x0= a), ∀i ∈ {1, . . . , n}, a ∈ {−1, 1}.

The last equality follows from the fact that p(x₁. . . x_n|_x₀ _{= a) = Π}n

i=1p(xi|x0 = a). This

means that we have a chain of n independent experiments where the chance of succes is p(xi= 1|x0= a) and the chance of failure is p(xi, 1|x0= a). This is obviously

binomi-ally distributed.

Proceeding further we see that:

I(x0; x1, . . . xn) = X a p(a) n X k=0 n k ! q_ak(1 − qa)n−k      log(q k a·(1 − qa)n−k) −      log q k 1(1 − q1)n−kp(a = 1) + qk−₁(1 − q−₁)n−kp(a = −1) !            (3.1)

(14)

• X₀0≡₍1 2,12) • (X_i1|_X0 0= 1) = ( e −_β eβ_+e−_β, e β eβ_+e−_β) = (q−1, q1) • (X_i1|_X0 0= −1) = ( e β eβ_+e−β, e −_β eβ_+e−β) = (q1, q−₁)

Note that q−₁= 1 − q₁and that:

logqk_a(1 − qa)n−k −_log_qk a(1 − qa)n−k+ qan−k(1 − qa)k = − log q k a(1 − qa)n−k+ qan−k(1 − qa)k (qka(1 − qa)n−k ! = − log1 + q_an−k−k(1 − qa)k−n+k = − log      1 + qa 1 − qa !n−2k     

Thus using this result, the fact that p(a = 1) = p(a = −1) and equation 3.1 gives us:

I(x0₀; x1₁, . . . x_n1) =1 2 n X k=0 n k ! q₁k(1 − q1)n−k 1 − log(1 + q1 1 − q1 !n−2k ) +1 2 n X k=0 n k ! qn−k₁ (1 − q1)k 1 − log(1 + 1 − q1 q1 !n−2k ) = 1 −1 2 n X k=0 n k ! qk₁(1 − q1)n−k log(1 + q1 1 − q₁ !n−2k ) −1 2 n X k=0 n k ! qn−k₁ (1 − q1)k log(1 + 1 − q1 q1 !n−2k ) = 1 − n X k=0 n k ! q₁k(1 − q₁)n−klog(1 + q1 1 − q1 !n−2k ) = 1 − n X k=0 n k ! q₁k(1 − q1)n−klog(1 + e2βn−2k) (3.2)

Where in the second equality the symmetry rule for binomial coefficients has been used n_k = _n−kn . Equation 3.2 is now only dependent on β and n thus we can define the mutual information between x0and its neighbours:

FStar(n, β) = 1 − n X k=0 n k ! q₁k(1 − q1)n−klog(1 + q1 1 − q1 !n−2k ), q1= eβ eβ_{+ e}−_β

(15)

its neighbours (x1, . . . , xn), where n is the number of neighbours and β = T1.

Note that the higher n, the higher the mutual information between x0 and its

neigh-bours. This can be explained by the fact that I(A; B) ≤ I(A; BC) since B ⊆ B ∪ C. When choosing T high enough, FStar(n,_T1) will converge to 0 since1:

1 − limβ→0 n X k=0 n k ! qk₁(1 − q1)n−klog(1 + e2βn−2k) = 1 − lim β→0 n X k=0 n k ! qk₁(1 − q1)n−klog(1 + e2β(n−2k)) = 1 − n X k=0 n k ! q₁k(1 − q1)n−k = 0

Furthermore we see that for large n the mutual information converges to 1.

Figure 3.1.: Left is MI vs T for n = 50, right is MI vs n for T = 5. We see that as T grows the MI converges to 0 and as n grows the MI converges to 1.

This network is not that interesting since information can only travel a maximum dis-tance of 1. In the next section we expand the star graph, each neighbour will follow an unique path with the same length so that information is able to travel over a greater distance d > 1.

3.1.2. Star path graph

Now suppose we have a star graph where each neighbour of the source x0proceeds in

a path until the layer d is reached: 1_{Note that} q1

1−q1 = e 2β_.

(16)

x0 x11 . . . . xd1 x12 . . . . xd2 x13 . . . . xd3 x14 . . . . xd4

DefineXd as the set of nodes in layer d. Then for all star path graphs we see condi-tional independence:

P (Xd|_x₀_{= a) = Π}n

i=1P (xdi|x0= a) n = Number of nodes in layer d.

Since P (xdi= x0|x0= a) is equal to the amount of even mismatches that occur between

the source node x0and the target node xdi 2we see that:

P (x_di= x₀|_x₀_{= a) = r}_d₌1 + (2q1−1)

d

2

Thus proceeding in the exact same way as with the star graph but replacing (q−₁, q₁) for

(1 − rd, rd) results in the following expression:

I(x₀;Xd) = 1 − n X k=0 n k ! r_dk(1 − r_d)n−klog(1 + rd 1 − rd !n−2k )

Thus the mutual information between x0andXd of a star path graph is:

FStar path(n, β, d) = 1 − n X k=0 n k ! r_dk(1 − rd)n−klog(1 + rd 1 − rd !n−2k )

The only difference between FStar pathand FStaris that the star path graph uses (1−r_d, r_d) in the sum terms instead of (q−₁, q₁). Intuitively, the mutual information should drop

as d grows:

F(n₀, β₀, d₂) ≤ F(n₀, β₀, d₁) d₁≤_d

2,

In figure 3.2 one can see the behavior of the mutual information between x₀andXd.

2_{See section 2.0.2 equation 2.1-2.3, here the chance of a mismatch is equal to q} −₁= e

−_β eβ_+e−_β.

(17)

Figure 3.2.: FStar path(n,_T1, d).

To exploit the conditional independence we now compute a graph where x0 is

con-nected by some path pd−1with xd−1and then bifurcates into k paths of distance 1. This

network will be called the fork, where all nodes inXdare now conditionally indepen-dent if one conditions on xd−1. We compute this graph so that we are able to calculate

the lower bound of the mutual information between x0 and the nodes on layer d 3.

Later in this thesis it will become clear why the fork is the lower bound. 3.1.3. Fork

The fork is a graph where the source node x0 follows one path and bifurcates into k

paths of distance 1 on the second to last node (xd−1). The figure on the next page

represents a fork where k = 3.

(18)

x0 . . . . xd−1 xd1 xd3 xd2 Lets calculate P (Xd_|_x 0= a): P (Xd|_x₀_{= a) = P (x}_d−1_{= x}₀|_x₀= a)P (Xd|_x_d−1_{= a) + P (x}_d−1_{, x}₀|_x₀_{= a)P (X}d|_x_d−1_{, a)} = r_d−1P (Xd|_x d−1= a) + (1 − rd−1)P (Xd|xd−1, a) Where rd−1= 1+(2q1 −₁₎d−1

2 , which is calculated as in 2.1-2.3 of section 2.0.2 by counting

the amount of even mismatches between x0 and xd−1. Since P (Xd|xd−1 = a) is

dis-tributed in the same way as a star graph, we can use the following substitution: X Xd p(Xd|_x₀_{= a) =} n(d) X k=0 n(d) k !     rd−1q k a(1 − qa)n(d)−k+ (1 − rd−1)qk−_a(1 − q−_a)n(d)−k       Where a ∈ {−1, 1} and:

Z(a, n(d), d, k) = rd−1qka(1−qa)n(d)−k+(1−rd−1)(q−_a)k(1−q−_a)n(d)−k, n(d) = Amount of nodes in layer d

We see that: I(x0;Xd) = X x0,Xd p(x0,Xd)      log(p(X d_|_x 0)) − log(p(Xd))      , =X x0 p(x0) X Xd p(Xd|_x₀₎      log(p(X d_|_x 0)) − log(p(Xd)       =X x0 p(x0) n(d) X k=0 n(d) k ! Z(x0, n(d), d, k)      log(Z(x0, n(d), d, k)) −_log ₁ 2(Z(1, n(d), d, k) + Z(−1, n(d), d, k))       = 1 +X x0 p(x0) n(d) X k=0 n(d) k ! Z(x0, n(d), d, k) log       Z(x0, n(d), d, k) Z(1, n(d), d, k) + Z(−1, n(d), d, k)      

(19)

= 1 −1 2 n(d) X k=0 n(d) k ! Z(1, n(d), d, k)      log(1 + Z(−1, n(d), d, k) Z(1, n(d), d, k) )       −1 2 n(d) X k=0 n(d) k ! Z(−1, n(d), d, k)      log(1 + Z(1, n(d), d, k) Z(−1, n(d), d, k))       = 1 −1 2 n(d) X k=0 n(d) k ! Z(1, n(d), d, k)      log(1 + Z(−1, n(d), d, k) Z(1, n(d), d, k) )       −1 2 n(d) X k=0 n(d) n(d) − k ! Z(1, n(d), d, n(d) − k)      log(1 + Z(−1, n(d), d, n(d) − k) Z(1, n(d), d, n(d) − k) )       = 1 − n(d) X k=0 n(d) k ! Z(1, n(d), d, k)      log(1 + Z(−1, n(d), d, k) Z(1, n(d), d, k) )       Thus: FFork(n(d), β, d) = 1 − n(d) X k=0 n(d) k ! Z(1, n(d), d, k)      log(1 + Z(−1, n(d), d, k) Z(1, n(d), d, k) )      

Intuitively, the mutual information between x0 and Xd is smaller in a fork than in a

star path graph. This can be explained by the fact that in each layer that is in between the source node x₀andXd, the star path graph has more nodes than the fork. In figure 3.3 this is illustrated.

(20)

Figure 3.3.: A comparison between a fork (Blue) an a star path graph (Orange) and the powerlaw B_ln(4)2dn. Later in this thesis it will become clear why both graphs converge to the same powerlaw.

In the next section we calculate the mutual information of networks where the source node x0 constructs multiple forks. The mutual information between x0 and Xd of

these multiple forks will be used to verify the mutual information of Cayley graphs constructed in section 3.1.5.

3.1.4. Multiple forks

A multifork is a graph where the source node x0has n neighbours. All of these

neigh-bours (x1i) travel like a star path graph where each of these paths have length d − 2. on

the last point of this path (xd−1,i) the path bifurcates into mi nodes. As an illustration,

(21)

x0 x11 . . . . xd−1,1 xd,1 x12 . . . . xd−1,2 xd,21 xd,22 x13 . . . . xd−1,3 xd,3 x14 . . . . xd−1,4 xd,41 xd,42

For multiple forks from some source node x0andXd= {X1d˙∪ ... ˙∪Xnd}we see that:

P (Xd|_x₀_{= a) = Π}n i=1P (Xid|x0= a) = Πn_i=1Zi(a, mi, d, k) and Zi(a, mi, d, k) = rd−1qak(1 − qa)mi −_k + (1 − rd−1)(1 − qa)k(qa)mi −_k

where miis the amount of bifurcation points of fork i, n is the amount of different forks

and d is the distance from x0to X_id. Now we see that:

I(x0;Xd) = X x0,Xd p(x0,Xd)      log(p(X d_|_x 0)) − log(p(Xd))       =X x0 p(x0) X Xd p(Xd|_x₀₎      log(p(X d_|_x 0)) − log(p(Xd))       =X x0 p(x0) X m1 . . .X mn m₁ k ! . . . mn k !      ΠZ(x0, mi, d, k)            log( ΠZ(x0, mi, d, k) ΠZ(1, mi, d, k) + ΠZ(−1, mi, d, k)       = 1 +X x0 p(x0) X m1 . . .X mn m1 k ! . . . mn k !      ΠZ(x0, mi, d, k)            log( ΠZ(x0, mi, d, k) ΠZ(1, mi, d, k) + ΠZ(−1, mi, d, k)       = 1 −X m1 . . .X mn m1 k ! . . . mn k !      ΠZ(1, mi, d, k)            log(1 + ΠZ(−1, mi, d, k) ΠZ(1, mi, d, k) )       Thus: FMultifork(m, β, d) = 1 −X m1 . . .X mn m1 k ! . . . mn k !      ΠZ(1, mi, d, k)            log(1 + ΠZ(−1, mi, d, k) ΠZ(1, mi, d, k) )      

(22)

where

m = (m1, . . . , mn)

We will now proceed to the mutual information between x0andXdof Cayley graphs,

since these graphs can represent real world networks. 3.1.5. Cayley graphs

A Cayley graph is a graph where each node has the same out degree (D) and where ev-ery node in the network has only one incoming edge4. Note that all Cayley graphs with 2 generations are multiple forks, so we can use the expression of the mutual informa-tion derived in secinforma-tion 3.1.4 to verify that the algorithms used in this chapter calculates the correct mutual information. An example of a Cayley graph where we have d = 3 andD = 2: x0 x₁₁ x12 x21 x22 x24 x₂₃ x31 x32 x33 x34 x35 x36 x37 x38

With Cayley graphs an exploitation of the conditional probability is not possible. There-fore we can’t construct a closed form expression of the mutual information analytically as we have done in the sections 3.1.1-3.1.4. We will show 2 numerical methods that can be used to calculate the mutual information between x0andXd.

First method

Suppose we have a Cayley graph where each node has out-degreeD. The number of states that take on value 1 inXdis a sum of binomial distributed variables X1+· · ·+Xm.

Each variable in the sum is either Bin(D,q1) or Bin(D,1 − q1) depending on the state of

4_{Except the source node x}

(23)

the parent node. Thus when traveling from one layer to another, it is important to know how many 1’s the parent set contains. Now define:

Xk_{= {x ∈ S | distance(x, x}

0) = k}

The conditional chance can be calculated in the following way:

P (Xd|_x₀_{= a) = P (x}₀= a)P (X1|_x₀= a) . . . P (Xd|Xd−1₎ = P (x0= a) X i1 P ({X1: #1 = i1}|{x0= a}) . . . X id P ({Xd: #1 = id}|{Xd−1: #1 = id−1}) (3.3) An algorithm to construct this probability P (Xd_|_x

0= a) follows the following steps:

• First map each layer to the amount of ones it can contain, for example if we have a graph where each node has degree 2, layer 2 has 22= 4 nodes thus we map layer 2 → [0, 1, 2, 3, 4]

• For every parent p, make a mapping to its child set c in the following way:

φ : p → D [0, 1] where_D[0, 1] = [0, 1] × · · · × [0, 1] | {z } D

= [0, 1]D. In words, we map the parent (p) to itsD childs. Each child becomes a 1 with chance q₁or q−₁depending on the state

of the parent.

• When moving from one layer to another, sort the parents with state 1 and state −_{1 so that you know which distribution to use when determining the state of the} childs.

• Now loop through all possible combinations of: D [0, 1] × · · · × D [0, 1] | {z } total parents

such that the sum is equal to S, given that the sum of the parent nodes is some

k ∈ N. If the calculation of the probability of each possible combination has been

made, we can sum up all those probabilities to get the resulting term:

P ({Xchild : #1 = S}|{Xparent: #1 = k})

(24)

• To calculate the entropy, be aware of the fact that P (Xd: #1 = S) =1 2 P (Xd: #1 = S|x0= 1) + P (Xd: #1 = S|x0= −1)

All the terms on the right hand side of the equation can be calculated with the algorithm.

• Calculate entropy and conditional entropy as in appendix A, where

I(X; Y ) = H(X) − H(X|Y )

The above algorithm has been compared with a Monte Carlo simulation of small net-works for verification. On top of that we compared it with the analytical expression of multiforks derived in section 3.1.4.

Second method

Calculating the mutual information with method one can take a long time since one must loop through all possible combinations. The next method concentrates on an an-alytical calculation using probability generating functions. Define the following func-tion: G(z) = E[zX] = ∞ X x=0 p(x)zx

Assume that the root is generation 0. Let the last generation X(n,1). . . , X(n,Dn₎ be the

random variables for generation n. Suppose that A_n=PDn

i=1X(n,i)and denote φq1 as the probability generating function for Bin(D,q1). Let φq−₁ be the probability generation function for Bin(D,q−₁). Then we see that:

Gn(z) = E[zP X(n,i)] (3.4) = E[zX(n,1)+···+X(n,Dn)_] _(3.5) = E[φqX1(n−1,1)(z)φ 1−X(n−1,1) q−₁ (z) . . . φ X_{(n−1,Dn−1)} q1 (z)φ 1−X_{(n−1,Dn−1)} q−₁ (z)] (3.6) = φD_q−n−1₁ E[ φ_q₁ φq−₁ !P X_(n−1,i) ] (3.7) = cn−1Gn−1(sn−1(z)) (3.8)

Where 3.5 to 3.6 follows from Law of total expectation: E[X] = E[E[X|Y ]] In 3.8 we define: cn−1= φD n−1 q−₁ sn−1(z) = φq1(z) φq−₁(z)

(25)

Using this recursive formula one gets: Gn(z) = cn−1Gn−1(sn−1(z)) .. . = c_n−1. . . c0G0(s0(z)) = Πn−1_i=0c_i·₍1 2+ 1 2s0(z)) If we want to get all probabilities we compute:

P (X = k) = G

(k)₍₀₎

k!

Example 3.1.1 Suppose one has the following graph:

x0

x11

x12

Then we see that:

E[ZP X 1 ] = φq−₁(z) E[ φq1(z) φq−₁(z) !X₀ ] = φq−₁(z) 1 2 1 + φ_q₁(z) φq−1(z) ! =1 2φq−₁(z) + 1 2φq1(z)

Now we see that:

• P (P X1_{= 0) =}G(0)₍₀₎

0! =12q2−₁+1₂q2₁

• P (P X1= 1) =G(1)_1!(0) = 2q1q−₁

• P (P X1_{= 2) =}G(2)(0)

2! =12q2−₁+1₂q2₁

For the conditional probabilities a similar calculation can be made, The only difference is that the term G0changes in:

G0(s0(z)) =        1, if conditioned on x0= −1 s0(z), if conditioned on x0= 1

(26)

Using this method one can get all possible probabilities and conditional probabilities. The calculation of the mutual information can be done using using appendix A. To verify these methods of calculating the mutual information between x0 and Xd,

one can compare the results of the mutual information of the Cayley graph with that of a multifork. Note that the Cayley graph with out-degree 2 and distance 2 is the same as a multifork with 2 forks that both bifurcate at d = 1 into k = 2 components. In figure 3.4 we verified that the mutual information of these 2 networks is the same by plotting the relative error. The relative error is very small, except when we compare the mu-tual information of the Cayley graph of degree 4 and distance 2 to its corresponding multifork expression derived in section 3.1.4.

Figure 3.4.: We compare the Cayley graphs constructed with method 1 (Algorithm) with the Cayley graph constructed by the analytical derivation of the mul-tifork. The relative error increases as the degree increases.

The difference in mutual information at degree 4 could be explained by machine errors. If one has a small enough chance for an event to occur, p = , the computer software returns p log(p) = 0. This means that if one calculates the mutual information:

I(X; Y ) = H(X) − H(X|Y )

Individual units with small enough chances can be neglected in this calculation. The problem is that for big enough networks, a lot of these small chances appear in our probability distribution. Thus the result will be an underfit of the actual mutual infor-mation. For example if we take the Cayley graph whereD = 4 and d = 2, the nodes on

(27)

In the computation of the Cayley graphs, we take the sum of binomial distributed vari-ables X1. . . Xm. By taking the sum, the size of the state space is reduced which implies

that the number of events that occur with a very small chance is reduced. If we take the same example as above, the sum of nodes that have state 1 in layer d = 2 has just 17 possibilities instead of 216= 65536 when considering individual units.

(28)

4. Analysis of the mutual information

In this section we are going to do an in depth analysis of the mutual information of the networks that we have calculated in chapter 3. It will become clear how the mutual information behaves and what properties a node must have to be the most influential within the network given that its distribution is (1₂,1₂).

4.1. Star graph

As we have seen in section 3.1.1 the mutual information of a star graph between x₀and Xd_{can be expressed in the following way:}

FStar(n, β) = 1 − n X k=0 n k ! q₁k(1 − q1)n−klog(1 + q1 1 − q1 !n−2k ), q1= eβ eβ+ e−_β

and with the help of a Taylor expansion around β = 0 we see that: 1 − n X k=0 n k ! qk₁(1 − q1)n−klog(1 + q1 1 − q1 !n−2k ) = β 2_n ln(4)+O 4 ≈ β 2_n ln(4)

Since FStar(n, β) is S-shaped in both dimensions β, n with upper bound 1, we can deduce that FStar(n, β) has the following properties:

FStar(n, β) =              β2_n ln(4) if β → 0, n → 0 1 if β → ∞, n → ∞

monotonically increasing in both n, β. Thus FStar(n, β) is in the following set:

S(n,β)= {s(n, β) | ∂s ∂β ∂s ∂n(0, 0) = ∂_ln(4)β2n ∂β ∂_ln(4)β2n ∂n (0, 0), 0 < ∂s ∂β ∂s ∂n ∀β, n > 0 : 0 ≤ s(n, β) ≤ 1}

In words, a function s(n, β) ∈S(n,β) behaves like β

2_n

ln(4) in the origin (0, 0) and is

(29)

Some functions that are also inS(n,β)are the following activation functions that behave

like _ln(4)β2n in the origin: • tanh(_ln(4)β2n), erf ( √ π 2 β2_n ln(4)) • β2n ln(4) 1+|_ln(4)β2n|, β2n ln(4) r 1+_ln(4)β2n2 , _π2arctan(π₂_ln(4)β2n)

In figure 4.1 we plot these activation functions denoted by A(n,_T1) ∈ S divided by

FStar(n,_T1). Note how all fits are perfect in the tail (T , n) → (∞, ∞) and the beginning (T , n) → (0, 0).

Figure 4.1.: A(n,T1)

FStar_(n,1

T)

against T and n for all activation functions. We see that the fits are perfect in the beginning and in the tail.

We see that FStar(n, β) ≤ tanh(_ln(4)β2n) as the green line in figure 4.1 is above 1. To im-prove the fit, we need to apply some coordinate transformation on tanh so that the function becomes smaller but stays the same in the beginning (T , n) → (0, 0) and in the tail (T , n) → (∞, ∞). This transformation h(n, β) must have the following conditions:

• h(n, β) =_ln(4)β2n if n, β → (0, 0), This ensures the same behavior around (0,0).

• limβ,n→∞h(n, β) = ∞ if n, β → (∞, ∞), This ensures convergence to the same constant.

• 0 ≤ ∂h_∂β∂h_∂n ≤∂

β2n

ln(4)

∂β ∂_ln(4)β2n

∂n , This ensures that h(n, β) ≤ β2_n

log(4) and h monotonically increasing.

With a h(n, β) that satisfies these conditions we see that:

h(n, β) ≤ β

2_n

ln(4) =⇒ A(h(n, β)) ≤ A(

β2n

(30)

To find the ideal h(n, β) such that A(h(n,β))_FStar_(n,β)≈1 is a difficult task since there are an infinite amount of choices for h(n, β). An initial guess of h(n, β) = ln(1 +_ln(4)β2n), which transforms tanh(_ln(4)β2n) → tanh(ln(1 +_ln(4)β2n)). This transformation results in a better fit as we see in figure 4.2.

Figure 4.2.: Effect of h on tanh where we compare tanh(

β2n ln(4)) F(n,T ) (Blue) and tanh(h(n,β)) F(n,T ) (Or-ange). h(n, β) = ln(1 +_ln(4)β2n).

(31)

4.2. Star path graph

Previously we have seen that the mutual information of a star path graph can be ex-pressed in the following way:

FStar path(n, β, d) = 1 − n X k=0 n k ! r_dk(1 − rd)n−klog(1 + rd 1 − rd !n−2k ), rd= 1 + (2q₁−₁₎d 2 With the help of a Taylor expansion around β = 0 we can see that:

FStar path(n, β, d) =              β2d_n ln(4) if β → 0, n → 0 1 if β → ∞, n → ∞

monotonically increasing in both n, β and monotonically decreasing in d. The mutual information in dimension n and β is S-shaped. In the following figure one

can see that FStar path(n, β, d) behaves the same as β_ln(4)2dn near the origin.

Figure 4.3.: We see that β_ln(4)2dn (Orange) and FStar path(n, β, d) (Blue) behave the same around the origin for d = 2.

(32)

Remember that FStar path(n, β, d) is S-shaped in dimension β and n and FStar path(n, β, d) ∈ S(n,β,d): S(n,β,d)= {s(n, β, d) | ∂s ∂β ∂s ∂n(0, 0, d) = ∂β_ln(4)2dn ∂β ∂_ln(4)β2dn ∂n (0, 0, d), 0 < ∂s ∂β ∂s ∂n, ∀β, n, d > 0 : 0 ≤ s(n, β, d) ≤ 1}

The activation functions that behave like β_ln(4)2dn in the origin are also inS(n,β,d). We can

use the same method used in 4.1, where a coordinate transformation h(n, β, d) must be found such that _Ftanh(h(n,β,d))Star path_(n,β,d) ≈1. h(n, β, d) must have the same properties as h(n, β) in dimension n and β.

In figure 4.4 we see _Ftanh(h(n,β,d)Star path_(n,β,d), where h(n, β, d) =

1

dln(1 + dβ2d_n

ln(4) ). The higher we choose

d, the worser the fit becomes.

Figure 4.4.: This figure shows us that the extra dimension d ∈ [2, 5] results in a worser fit since tanh(h(n,β,d)

FStar path_(n,β,d) gets further away from 1 as d grows.

4.3. Fork

For forks we have seen the following expression:

FFork(n(d), β, d) = 1 − n(d) X k=0 n(d) k ! Z1,n(d),d,k      log(1 + Z(−1, n(d), d, k) Z(1, n(d), d, k) )       where, Z(a, n(d), d, k) = rd−1qka(1 − qa)n−k+ (1 − rd−1)(1 − qa)k(qa)n−k Let’s define: Xk G1= {x ∈ G1 |_{distance(x, x}₀_{) = k}}

(33)

Then the number of nodes in each consecutive layer of the fork is smaller than any other graph G2, given that each node in G2has only one incoming edge. This restricts

paths to overlap with one another, in mathematical terms: |Xk

Fork| ≤ |X

k G1

| ∀_{k ∈ {0, 1, . . . , d},} ∀_G

1where each node has one incoming edge.

This implies thatX_Forkk ⊆Xk

G1and thus:

I(x₀,X_Forkd ) ≤ I(x₀,X_Gd

1)

This means that the mutual information of the fork connecting x0 and Xd is a lower

bound of the mutual information between x₀andXd.

With the same reasoning, assuming that each node has only one incoming edge within our system, the graph with the most nodes in each consecutive layer to reachXdis the star path graph. This is because the star path graph has always the maximum amount of nodes possible (|Xd_|_{) in each consecutive layer. Thus the upper bound of the mutual}

information of x0 and Xd is reached by calculating the mutual information of a star

path graph with |Xd_|_{paths of distance d.}

In summary, for all networks where each node in the system has one incoming edge we see that:

FFork(n(d), β, d) ≤ I(x0;Xd) ≤ FStar path(n(d), β, d)

4.4. Multiple forks

For multiple forks we have seen the following expression:

FMultif ork(m, β, d) = 1 −X m1 . . .X mn m1 k ! . . . mn k !      ΠZ(1, mi, d, k)            log(1 + ΠZ(−1, mi, d, k) ΠZ(1, mi, d, k) )       with Z(x0, mi, d, k) = rd−1qka(1 − qa)mi −_k + (1 − rd−1)(1 − qa)k(qa)mi −_k

Where i ∈ {1, . . . , n} is the index of the i-th neighbour of x0and miis the amount of nodes

of fork i at distance d. Recall that the figure on the next page is a multifork with n = 4 and m = {1, 2, 1, 2}.

(34)

x0 x11 . . . . xd−1,1 xd,1 x12 . . . . xd−1,2 xd,21 xd,22 x13 . . . . xd−1,3 xd,3 x14 . . . . xd−1,4 xd,41 xd,42

When we compare the mutual information between x0andXdof forks, multiple forks

and star path graphs we expect strict inequalities:

FFork(n(d), β, d) ≤ FMultif ork(m, β, d) ≤ F(n(d), β, d) m = (m1. . . mk),

X

mi= n(d)

On the next page, we see in figure 4.5 that these inequalities hold. Furthermore all graphs converge to the same powerlaw as T → ∞ which can be seen in figure 4.6.

(35)

Figure 4.5.: MI per n. Note that green ≤ yellow ≤ blue and that all lines converge to the same powerlaw (Red). At n = 2 the star path graph and the multifork are on top of each other.

(36)

4.5. Cayley graphs

Since the mutual information of these graphs was made with an algorithm1, we were not able to construct a closed form expression for the mutual information between x0

and Xd. However one can still analize its behavior. The computation time of the algorithms that have been developed to construct the mutual information of Cayley graphs take a long time, therefore we will compare the Cayley graph with distance 3 and degree 2 to forks, star paths and multiple forks2.

Figure 4.7.: The colored networks illustrated on the right hand side have all the same distance d = 3, The corresponding mutual information between x0andXd

can be seen on the left hand side.

In figure 4.7, it becomes clear that graphs with the most nodes in the layers between the source node x0and the nodes in layer d (Xd) have the highest mutual information.

The fork is the lower bound of the mutual information since it has only 1 node in each layer between x0 andXd (red). The star path graph has the highest mutual

informa-tion since in each layer it reaches the maximum amount of nodes (blue). The mutual information of the other two graphs (green and yellow) is bounded by the mutual infor-mation of the fork and the star path graph. Since the Cayley graph (yellow) has more nodes in d = 2 than the multifork (green), the mutual information of the Cayley graph is greater or equal than that of the multifork graph.

Now lets compare graphs G1and G2where the number of nodes in each layer of graph

G2do not need to be greater or equal than the number of nodes in each layer of graph

G1. In mathematical terms this means that it is not true that:

∀_{k ∈ {0, . . . , n},} |Xk G1 | ≤ |Xk G2 | Where |X_Gk 1

|_{is the cardinality of the set of nodes in the layer k (}Xk_{) of G}

1. Here is an

example of two graphs G1and G2where this is the case:

1_{The first or second method in section 3.1.5.} 2_{The colored graphs in figure 4.8.}

(37)

x0 x11 x₁₂ x21 x₂₂ x24 x23 x31 x32 x33 x34 x35 x36 x37 x38 x0 x11 x₁₃ x12 x21 x₂₃ x22 x₃₄ x35 x32 x33 x31 x37 x₃₆ x₃₈ G1is a

Cay-ley graph and G2is a multifork. In the figure below we see that the mutual information

of G1is lower than that of G2for low T . However if T grows, the mutual information

of G1overtakes the mutual information of G2at some point T0≈1.93.

Figure 4.8.: Here we see that the Cayley graph has a smaller mutual information than the multifork for low temperatures, however after T = 2 the Cayley graph has a greater mutual information than the multifork.

Thus in summary, the mutual information between x0andXd is higher for a network

G2than that of a network G1if in each consecutive layer G2has more nodes than G1.

|Xk G1 | ≤ |Xk G2 |_{, ∀k =⇒ I(x}₀_;Xd G1) ≤ I(x0;X d G2)

(38)

5. Approximations

In this section we try to approximate the mutual information of the star graph FStar(n, β) and the star path graph FStar path(n, β, d). An approximation can make the computation time a lot faster since the summation terms in the derived expressions of the mutual information can be computationally expensive if one chooses n to be high enough. In the approximation of the mutual information of the star path graph we will see what problems the extra dimension of the distance d causes. Since all other epressions for the mutual information that were derived in chapter 3 have the same problem as the star path graph, we will not discuss them.

5.1. Star graph

Remember that for star graphs, the mutual information FStar(n, β) behaved the same as the activation functions with slope _ln(4)β2n in the origin. Lets define the region wherefor

A(n,β)

FStar_(n,β), 1 as C and choose A(n, β) = tanh

β2_n(18−9β2_n+β4_{(4+6(−1+n)n))}

36 log(2)

where the term in the tanh is the 6th degree Taylor polynomial evaluated around β = 0.

The goal is to find a fit for FStar(n, β) in the region C so that we can define our ap-proximation as: G(n, β) =          tanh β2_n(18−9β2_n+β4_{(4+6(−1+n)n))} 36 log(2) if not in C FIT (n, β) if in C

Since C is unknown, we need to guess a region C∗⊂_{C where we can find a good fit.} After some experimentation C∗can be chosen as:

C∗(n) = {0.5 √ n ln(4)≤T ≤ 2 √ n ln(4)} ⊂C(n)

Now use the least squares method to fit the following function over the region R = S100

i=1C

∗

(i):

a + b tanh(cβdne) This results in the following parameters:

(39)

Thus: G(n, β) =              a + b tanh(cβdne) if 0.5 √ n ln(4)≤ 1 β ≤2 √ n ln(4)

tanhβ2n(18−9β2_{36 log(2)}n+β4(4+6(−1+n)n)) elsewhere

The maximum relative error of this fit G(n, β) is equal to 0.0270203 ≈ 3% if we choose

n ∈ [1, 500], T ∈ R+. We see that the error grows slightly as n grows, implying that this

fit will probably not work if n is chosen to be high enough.

Figure 5.1.: Here we see the piecewise fit G(n, β), the gridlines represent the region

C∗(50) (top) and C∗(500) (bottom).

5.2. Star path graph

For the star path graph, remember that FStar path(n, β, d) behaves like _ln(4)β2dn in the origin (0, 0, d) and that if d grows, the error of the fit tanh(h(n, β, d)) grows.

Using the same method as with the star graph by applying the least squares method fails. This is because the region where _FStar pathA(n,β,d)_(n,β,d), 1 is also dependent of the dimen-sion d. Therefore the fit must be done on a larger region which is computationally expensive. On top of that the extra dimension d makes it harder to guess what function to fit over FStar path(n, β, d).

(40)

After some experimentation with transformations we can construct the following bounds for FStar path(n, β, d):

tanh(β n ln(4)

!_2d1

)2d≤_FStar path_{(n, β, d) ≤ tanh(}β

2d_n

ln(4))

Figure 5.2.: Here we see tanh(

β2d n

ln(4))

FStar path_(n,β,d) and

tanh(β n

ln(4)

_2d1 )2d

FStar path_(n,β,d) where d ∈ [2, 5] and n = 50.

To try minimizing the effect of the dimension d on the mutual information, one can try to construct: Fit(n, β, d) = tanh β2ψ(d)d n ln(4) 1 ψ(d) !ψ(d)

Where ψ(d) is some function. The upper bound is reached when ψ(d) = 1 and the lower bound is reached when ψ(d) = 2d. In figure 5.3 we see that the region where

FStar path(n, β, d) , Fit(n, β, d) has become smaller if ψ(d) = 2d − 2. However the error still grows as d grows.

(41)

Figure 5.3.: Here we see _FStar pathFit(n,β,d)_(n,β,d) where ψ(d) = 2d − 2. Note that the region where

FStar path(n, β, d) , Fit(n, β, d) has become smaller compared to the fit with the upper and lower bound.

Let’s now assume that ψ(d) is not linear and introduce an an adjustment parameter

α(n, β, d) to our fit: Fit(n, β, d) = tanh β2 d ψ(d)α(n,β,d) n ln(4) 1 ψ(d)α(n,β,d) !ψ(d)α(n,β,d)

One can use the least squares method to find the ideal α(n₀, β, d₀) in the region T ∈ [0.1, 10] where n0and d0are fixed. In the figure below we see the results.

Figure 5.4.: Left is the best fit for α(n,_T1, d) per n for specified d. Right is Fit(50,1T,d)

FStar path_(50,1

T,d)

(42)

We conclude that fitting a good function over FStar path(n, β, d) is very hard. We have been able to reduce the size of the error by using a few useful transformations, however the error of the fit keeps growing allongside the dimension d. The fork, multifork and Cayley graph all have the same problem since these networks are also dependent of d. In appendix B an other method has been tried to approximate FStar path(n, β, d) where De Moivre-Laplace theorem has been used.

(43)

6. Conclusion

For specific networks based on the kinetic Ising model, we have been able to construct the mutual information between some source node x0and its nodes on layer d:

• FStar(n, β) = 1 −Pn k=0 n kqk1(1 − q1)n−klog(1 + e2βn−2k) • FStar path(n, β, d) = 1 −Pn k=0 n krdk(1 − rd)n−klog(1 + _r d 1−rd n−2k ) • FFork(n(d), β, d) = 1 −Pn(d) k=0 n(d) k      Z1,n(d),d,k            log(1 + Z(−1,n(d),d,k) Z(1,n(d),d,k))       • FMultifork(m, β, d) = 1−P m1. . . P mn m1 k . . . mn k       ΠZ(1, mi, d, k)            log(1+ ΠZ(−1,mi,d,k) ΠZ(1,mi,d,k))      

These derivations could be made because of the exploitation of the following property of the conditional chance called conditional independence:

p(Xd|_x₀_{= a) = Π}n

i=1p(Xid|x0)

Furthermore two algorithms have been constructed to compute the mutual informa-tion between x0andXdof Cayley graphs.

The first method is an algorithm that iterates through all possible paths from a x0= a

to Xd such that the number of ones in Xd is equal to some k. By calculating the probabability of all these paths and then sum them up one gets:

P (Xd: #1 = k|x0= a)

By doing this for all k ∈ {0, . . . ,Xd} _{and a ∈ {−1, 1} one can easily calculate the mutual} information between x0andXd.

The second method uses recursion. The goal of this method is to find the probability generating function Gn(z) of the last generation which containsDnnodes. Using Gn(z)

one can make the probability distributions needed to calculate the mutual information since:

P (X = k) = G

(k)₍₀₎

An analytical approach to information dissipation