• No results found

Deep Clique Comparison Agent (DCCA)

Critic Network

The critic network uses a single virtual node with Cov(ξˆ°π‘‘βˆͺ{ 𝐢𝑑}

)as input. This virtual node outputs the value of that state. This method is based on the algorithm of Zhang et al. (2020), which used a similar setup for their critic network. Figure 3.3 shows an example of this process.

π‘₯1

π‘₯2

π‘₯3

π‘₯4

π‘₯5 π‘₯6

π‘₯7

π‘₯8 π‘₯9

Μ‚ 𝑣

Figure 3.3: The figure shows how ̂𝑣(𝑠𝑑)is calculated by collating the latent encodings of the cliques in figure 3.2, with 𝑠𝑑={

𝐢1, 𝐢2, 𝐢3}

Equation 3.5 shows the input for the critic network. The input of this is the coverage of the current candidate clique set ξˆ°π‘‘and the newfound clique 𝐢𝑑. The critic network collects all the latent node encodings π‘§π‘’βˆˆξ‰†ξˆ³from all the nodes found in the coverage of the current state 𝑒 ∈ Cov(

𝑠𝑑)

, which is Cov( 𝑠𝑑)

= Cov(

ξˆ°π‘‘βˆͺ{ 𝐢𝑑})

. The final output is then a single value ̂𝑣(𝑠𝑑) ∈ ℝ

𝑣(𝑠̂ 𝑑) =MLPβŽ›

⎜⎜

⎝

(1 + πœ–) β‹… 0 + βˆ‘

π‘’βˆˆCov(ξˆ°π‘‘βˆͺ{𝐢𝑑}) 𝑧𝑒

⎞⎟

⎟⎠

(3.5)

Algorithm 4The clique composing operation of DCCA

1: functionCOMPOSECLIQUESET(Graph , actor weights πœƒπ‘Ž, graph encoder weights πœƒξˆ³)

2:  ← βˆ… ⊳Initialise an empty candidate clique set

3: ξ‰†ξˆ³β† πœƒξˆ³() ⊳Gather the node latent features for the whole graph

4: for 𝐢∈PivotBronKerbosch() do

5: if|| < π‘˜ then ⊳The first π‘˜ cliques are added

6:  ←  βˆͺ {𝐢}

7: else

8: ξˆ°βˆ—β†ξˆ° βˆͺ {𝐢}

9: π‘Žβ† πœ‹πœƒ

π‘Ž(ξˆ°βˆ—,ξ‰†ξˆ³) ⊳The actor decides which clique should be removed

10:  ← ξˆ°βˆ—β§΅{

ξˆ°βˆ—π‘Ž}

11: end if

12: end for

13: return

14: end function

In algorithm 4, we show how DCCA composes a clique set from a graph. At the start of the run, the algorithm initialises an empty candidate clique set. It uses the graph encoder network to generate the latent node encodings ξ‰†ξˆ³for the whole graph, which we reuse as input for the actor-network at each iteration. Cappart et al. (2021) argues how the encode-process-decode paradigm can be used for algorithmic reusing, which we do, or multi-task learning, for which we make recommendations in our discussion.

This reusing of the latent encodings allows us to speed the execution time of DCCA significantly because calculating the node latent encodings of the whole graph is the computationally heaviest task and should improve the scalability of DCCA.

We use the Pivot Bron-Kerbosch algorithm (Cazals and Karande, 2008) to find all the maximal cliques in the graph. The algorithm will always add the first π‘˜ cliques to the candidate clique set . When |  | = π‘˜, a decision clique set ξˆ°βˆ—is created by adding the newfound clique to . The actor-network then uses ξˆ°βˆ—, and the latent node features 𝑧𝑛 ∈ ξ‰†ξˆ³as input to decide which clique should be removed from ξˆ°βˆ—. This clique is then removed, and  becomes ξˆ°βˆ—with the removed clique. This process repeats until Deep Clique Comparison Agent (DCCA) checked every clique in graph , at which point the current candidate clique set will be returned as the final clique set.

Start

Initialise Weights

Generate Graph

Graph 

Run Episode (Algorithm 4)

Training Examples

Generate Batches (Section 3.3.1)

Batched Examples

Train Epoch

Did 𝐸 Epochs?

Training?Done

Stop

yes

yes no

no

Figure 3.4: The flowchart of the training procedure of DCCA.

Figure 3.4 shows the training procedure of DCCA. We start by initialising the weights of all the networks, which are the graph encoder, actor and critic network. After that, the training loop starts. At the beginning of each loop, we will generate a graph. This graph is then used to create training examples by running an episode using algorithm 4.

We generate batches from those training examples with a custom batching algorithm, which we will explain in section 3.3.1. Those batches are shuffled at the start of each epoch. DCCA will then train 𝐸 epochs on those batches. After that, it will either start a new episode with a newly generated graph or stop training, if it trained for a certain threshold of steps.

Besides PPO and other policy gradient algorithms, the literature review chapter also discussed two different deep RL algorithms: DQN and Neural MCTS. Both DQN and Neural MCTS would have been good choices; however, they are significantly slower to train or take many more resources.

Compared to PPO, DQN is, in most cases, slower to train while being more sample efficient and therefore is the better choice if gathering data is computationally heavy.

For DTKC and DTKWC, this is not the case because, for both problems, we can gather unlimited data without it being computationally heavy to do. DQN is comparably even slower for our approach than PPO because PPO can gather data from each episode by reusing the output of the graph encoder ξ‰†ξˆ³. With DQN, this would not be possible because DQN does gradient ascent after each action taken, thus changing the weights for the graph encoder.

We decided not to use Neural MCTS because it is slower to train, similar to DQN, and needs significantly more resources. Moreover, due to Neural MCTS being a recent development, there are few implementations, and thus it would be considerably harder to implement.

3.3.1 Batching Algorithm

Most batching methods for training a GNN architecture can be divided into two groups, one for graph-level tasks and one for node-level tasks. With batching for graph-level tasks, the batching algorithm will combine multiple graphs into a single graph for a single forward pass. Node-level tasks combine multiple node-level tasks for a single graph into one pass. For example, it will try to predict the node labels for all the nodes in a single forward pass. However, DCCA is a subgraph-level task, for which no batching algorithm exists. Therefore, we decided to design a new batching algorithm to improve the training time significantly.

As previously mentioned, we collect π‘˜+1 clique outputs for the actor network and a single coverage output for the value network through a GNN architecture from a latent encoded graph. This new batching algorithm aims to have a single pass of the graph encoder network for the graph and then perform 𝐡 steps on this latent encoded graph, in which 𝐡 is the batch size. Therefore, the batching algorithm collects 𝐡 steps, which were performed on the same graph, such that 𝐡(π‘˜ + 1) cliques inputs and 𝐡 coverage inputs are collected. The network’s output is thus two vectors, 𝐗 ∈ ℝ𝐡(π‘˜+1)Γ—1, the output of the actor network and 𝐕 ∈ ℝ𝐡×1, the output of the value network. 𝑋 does not have the correct output dimensions because we want to have the cliques of the same step in a single row, such that the 𝐡(π‘˜ + 1) Γ— 1 vector is transformed into a 𝐡 Γ— π‘˜ + 1 matrix.

The algorithm, therefore, creates a batch index vector 𝐼 ∈ ℀𝐡(π‘˜+1)Γ—1+ that collects the cliques’ outputs of a single step in a row such that we get a 𝐡 Γ— (π‘˜ + 1) matrix.

3.3.2 Software and Hardware

We implemented DCCA in Python with the use of PyTorch and Pytorch Geometric (Paszke et al., 2019; Fey and Lenssen, 2019). We used NetworkX to find the cliques in a graph (Hagberg et al., 2008). We did the experiments with a Ryzen 5 2600 Six-Core Processor 3.4 GHz CPU, 16GB RAM and an Nvidia GeForce RTX 2060 GPU.

We based our algorithm on the code from Kostrikov (2018)2.