Network Architecture - Utilising Reinforcement Learning for the Diversified Top-

can easily be between -100 and 100. Nevertheless, there are two possible solutions to solve this problem.

The first is to divide the reward by the maximum clique for DTKC or the max-imum weighted clique for DTKWC. There are algorithms that find these cliques ei-ther precisely or through approximation (Boppana and Halldórsson, 1992; Warren and Hicks, 2006).However, due to both problems being NP-Hard, finding these cliques can be computationally heavy depending on the graph and therefore slow down training sig-nificantly when graphs are generated during training. Another negative of this solution is that it only can be used for DTKC and DTKWC. Therefore, we decided to focus on another solution.

The second solution is to scale the reward by a scalar value 𝜌 such that the reward range stays closer to 0. Cappart et al. (2018) proposed this solution for their deep RL algorithm for the maximum cut-problem and the maximum independent set. They ar-gued that it improved training because gradient descent struggles with sparse and large rewards. We also decided to implement this scaling for our algorithm because it allows the agent to learn other problems than DTKC and DTKWC. The final reward function is thus equation 3.1, with 𝜌 being the scalar value:

𝑟_𝑡= 𝜌(

score(𝑠𝑡+1) −score(𝑠𝑡))

. (3.1)

The last two equations show the specific reward function for each problem we will train DCCA for. The reward function for DTKC (equation 3.2) is the difference between the size of the new coverage and the old coverage. For DTKWC (equation 3.3), this is the difference between the summation of the weights between the old and new coverage.

𝑟^DTKC_𝑡 = 𝜌(|

||Cov(

𝑡+1)|

|| −|||Cov(

𝑡)|

||)

(3.2)

𝑟^DTKWC_𝑡 = 𝜌

⎛⎜

⎜⎝

⎛⎜

⎜⎝

∑

𝑣∈Cov(𝑡+1) 𝑤(𝑣)

⎞⎟

⎟⎠

−

⎛⎜

⎜⎝

∑

𝑢∈Cov(𝑡) 𝑤(𝑢)

⎞⎟

⎟⎠

⎞⎟

⎟⎠

(3.3)

 Graph

Encoder _

Actor Network

Critic Network

Figure 3.1: This figure show our network design. We will explain how the Graph En-coder functions in section 3.2.1. This networks gets as input a graph  and outputs the latent node encodings _. These encodings are then used at each step in the episode by the actor and critic network, which we explain in section 3.2.2

3.2.1 Graph Encoder

The task of the first network is to encode the whole graph such that the structural infor-mation of the graph is encoded into latent vectors for the nodes. Therefore, we needed to find a GNN architecture that would capture this information and select input features, which help the GNN architecture capture this information. We decided to use a Graph Isomorphic Networks (GIN) (Xu et al., 2018a) as our GNN architecture, based on the usage by Abe et al. (2019). They demonstrated that an RL agent could learn to find the maximum clique in a given graph using a GIN.

In their paper, Abe et al. (2019) used five layers of GIN with a hidden dimension of 32 and each MLP. The GIN layers also consisted of five layers, in which the input and output dimensions were thus 32 and the hidden dimension 16. As input, they used a vector of ones, which helped capture the structural information of the graph. Other research also shows that this method can capture the relevant structural information (Cui et al., 2021).

For DCCA, we decided to use a similar network setup as Abe et al. (2019). We decided to test different numbers of GIN layers and hidden dimension sizes for both the GIN layer and the MLP within the GIN layer. We will state the final setup in our hyperparameter section (table 4.8).

The graph encoder will encode the latent representation of the nodes 𝑧 ∈ _of a given graph . The actor and critic network use _as their input at each step. We based this setup on the encode-process-decode paradigm (Cappart et al., 2021), which states that multiple computations can be done on the same latent space.

The main downside of using GIN is that the architecture is computational heavy compared to other architecture such as Graph Convolutional Networks (Kipf and Welling,

2017) and Graph Attention Networks (Veličković et al., 2017). However, the GIN en-coder network is only run once for each graph, and thus this computational heaviness is insignificant during the evaluation, but it does increase the training time.

3.2.2 Actor-Critic Network

Both the actor and critic network use a GIN architecture. Therefore, we decided to use the virtual node method for our subgraph-level task. The upcoming paragraphs will explain both networks and the inputs for them.

Actor Network

The actor network has as input each clique and thus uses 𝑘 + 1 virtual nodes, one for each clique in the current candidate clique set 𝐷_𝑡and the newfound clique 𝐶_𝑡. It is essential to state that the input of each clique node is independent of one another. This independence means two virtual nodes can share the same nodes as input, but they do not communicate. Figure 3.2 shows an example of this procedure.

𝑥₁

𝑥₂

𝑥₃

𝑥₄

𝑥₅ 𝑥₆

𝑥₇

𝑥₈ 𝑥₉

𝐶^′

1 𝐶^′

2 𝐶^′

Figure 3.2: This figure shows the actor input for three different cliques, with 𝐶1 = {𝑥₁, 𝑥₂, 𝑥₃, 𝑥₄}

, 𝐶2={

𝑥₃, 𝑥₄, 𝑥₅}

and 𝐶3={

𝑥₇, 𝑥₈, 𝑥₉}

. 𝐶1and 𝐶2share the nodes 𝑥₃and 𝑥4, and therefore both have their latent encoding 𝑧_𝑖 ∈_as input.

Equation 3.4 shows the input for the actor network. The 0 in the calculation is normally the node itself, but because we use virtual nodes, this will be 0. The actor network collects the latent node encodings 𝑧_𝑢 ∈_from a clique 𝑢 ∈ 𝐶_𝑖. It does this for all the cliques in the current candidate clique set 𝐶 ∈ _𝑡and the newfound clique 𝐶_𝑡. Therefore, the final output of the actor network is 𝑋 ∈ ℝ^(𝑘+1).

𝐶_𝑖^′=MLP (

(1 + 𝜖) ⋅ 0 + ∑

𝑢∈𝐶_𝑖

𝑧_𝑢 )

(3.4)

Critic Network

The critic network uses a single virtual node with Cov(𝑡∪{ 𝐶_𝑡}

)as input. This virtual node outputs the value of that state. This method is based on the algorithm of Zhang et al. (2020), which used a similar setup for their critic network. Figure 3.3 shows an example of this process.

𝑥₁

𝑥₂

𝑥₃

𝑥₄

𝑥₅ 𝑥₆

𝑥₇

𝑥₈ 𝑥₉

̂ 𝑣

Figure 3.3: The figure shows how ̂𝑣(𝑠_𝑡)is calculated by collating the latent encodings of the cliques in figure 3.2, with 𝑠_𝑡={

𝐶₁, 𝐶₂, 𝐶₃}

Equation 3.5 shows the input for the critic network. The input of this is the coverage of the current candidate clique set _𝑡and the newfound clique 𝐶_𝑡. The critic network collects all the latent node encodings 𝑧_𝑢∈_from all the nodes found in the coverage of the current state 𝑢 ∈ Cov(

𝑠_𝑡)

, which is Cov( 𝑠_𝑡)

= Cov(

𝑡∪{ 𝐶_𝑡})

. The final output is then a single value ̂𝑣(𝑠𝑡) ∈ ℝ

𝑣(𝑠̂ _𝑡) =MLP⎛

⎜⎜

⎝

(1 + 𝜖) ⋅ 0 + ∑

𝑢∈Cov(𝑡∪{^𝐶𝑡}⁾ 𝑧_𝑢

⎞⎟

⎟⎠

(3.5)

In document Utilising Reinforcement Learning for the Diversified Top- (pagina 35-38)