Satisfiability solving for matching problems

(1)

Satisfiability solving

for matching problems

(2)

Layout: typeset by the author using LA_TEX.

(3)

Satisfiability solving

for matching problems

using SAT to solve many-to-many matching problems

Sander Vonk 11868627

Bachelor thesis Credits: 18 EC

Bachelor Kunstmatige Intelligentie

University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisor Dr. R. de Haan

Institute for Logic, Language and Computation Faculty of Science

University of Amsterdam Science Park 107 1098 XG Amsterdam

(4)

Abstract

In this thesis, the possibility of solving many-to-many matching problems with one-sided preference using satisfiability solving will be investigated. The main question of this thesis is the following: How to find high quality matchings for many-to-many two-sided matching problems using satisfiability solving? To find these matchings, there was first done theoretical research into matching problems and SAT solving. After this, an experiment was conducted. For the experiment, five data sets were used. These files contained bidding data of reviewers on submissions they could review. Using python and the z3 library in specific, logical rules were set up to translate this matching problem into a SAT problem. These rules were then combined with a variable bid score distribution to represent yes, maybe and no bids from the data. Furthermore, the cardinality (in the form of a maximum number of submissions per reviewer) was added. The logical rules and variable values were then used to compute a model of matches for each of the files using the z3 SAT algorithm, where the total bid score for matches was optimised. Using the social welfare bid score of the matches, together with some other values like the computed equity and algorithm running time, the created models were evaluated. There was found that SAT can be a satisfying technique for solving this type of problem, but the success rate highly depends on the properties of the data set. The size of a data set strongly impacted the algorithm running time, which highly limits the fitness of SAT for finding matches for large data sets. Also, properties like the ratio between reviewers and submissions for different data sets impact the number of reviews each submissions gets for an equally set maximum number of submissions per reviewer. Consequently, this makes it very difficult to find a good balance in cardinality for reviewers and submissions for some files. Another property that influences the matching quality is the ratio of yes bids in a file. Because of all these different factors, finding high quality matches could be challenging. Nevertheless, high quality matchings can be found using SAT, although the time it takes to find a solution seems to increase exponentially with the number of rows a data set contains.

(5)

3 Approach 13 3.1 The data . . . 13 3.2 The experiment . . . 15 3.2.1 Logical rules . . . 15 3.2.2 The programming . . . 17 3.2.3 Optimization . . . 17 3.3 Algorithm output . . . 19 4 Results 20 4.1 First results . . . 20 4.2 Specific research . . . 25 4.2.1 Data set 2.1 . . . 25 4.2.2 Data set 2.2 . . . 27 4.2.3 Data set 2.3 . . . 28

(6)

4.2.5 Data sets combined . . . 31

5 Conclusion 36

6 Future research 39

Bibliography 40

(7)

Chapter 1 Introduction

There are numerous instances where agents (anyone or anything that partakes in a matching problem) on one side need to be matched to agents on another side; these are named two-sided matching problems [Sotomayor, 1999]. Based on cardi-nality (the number of agents that an agent can be matched with), there are three main types of two-sided matching problems: one-to-one matchings, one-to-many matchings and many-to-many matchings [Manlove, 2013]. There are also varia-tions and influences that differentiate matching problems within these categories. For example, agents might have preferences. There might also be a time limit for agents to match with certain other agents, which can influence matchings. Pref-erences of agents can be expressed in several different ways. Agents might have a list of a fixed number of preferred matches, they might have given each possible match some sort of ranking, or they might have made a list of all agents in their order of preference [Manlove, 2013].

While many combinations and variations on cardinality and preferences exist, there are some basic examples that describe common matching problems. These include the stable marriage problem for one-to-one matchings and the hospitals/residents problem for one-to-many matchings [Manlove, 2013]. These matching problem types will be discussed later.

1.1 Many-to-many matching

An example of many-to-many matching problems is matching submissions in an academic conference to reviewers. Each article needs multiple reviewers, but each reviewer has their own area of expertise, and therefore their own preferences for the articles they want to review. Each reviewer will be assigned to multiple articles, yet some reviewers may only find a few submissions to be within their area of

(8)

pref-erence. Their preferred submissions to review, may also overlap with those other reviewers. It is desirable to get multiple reviews on each submission. Moreover, it is best if the number of reviews does not vary too much between submissions. Therefore, it is not always possible to only assign papers to reviewers that are within their area of expertise. The problem of matching reviewers with their pref-erences to submissions will be referred to as the reviewers/submissions (matching) problem from here on.

One approach to solve the problem would be to randomly allocate papers to review-ers in such a way that every researcher has an even amount of papreview-ers. However, this would most likely result in a non-optimal allocation since reviewers have to review papers on subjects that they might know very little about. Another ap-proach might be to ensure that reviewers get as many papers within their area of expertise as possible, and this could be a better way to solve the matching problem. However, there might still be submissions that are being reviewed by reviewers that are unfamiliar with the subject of the paper. One could also choose to minimize the allocation of submissions to reviewers that know little about the subject of a submission.

Another consideration is the minimum and maximum number of submissions per reviewer and the minimum and maximum number of reviewers per submission. These numbers describe the cardinality. Changing the cardinality changes the matches that are made. A high cardinalty results in more matches, but comes with the risk of a lower average match quality. On the other hand, a lower cardinalty usually comes with higher average match quality matchings, but less matches will be made [Manlove, 2013]. This is an important consideration for designing a matching algorithm.

In matching problems, there is rarely a most satisfying solution. The quality of a solution to a matching problem is often subjective; there are various ways to solve them, each solution has its own advantages and disadvantages [Brandt et al., 2016].

1.2 Satisfiability solving

The focus of this research will be on finding an optimal solution for the review-ers/submissions problem. Instead of using traditional algorithms to solve this matching problem, satisfiability solving will be used. SAT or satisfiability solving is an approach where some problem is translated into a Boolean satisfiability prob-lem which is then solved instead [Biere et al., 2009]. To solve a SAT probprob-lem, one must set all variables in a Boolean formula to either true or false in such a way that the total formula evaluates to true. A more in-depth explanation of satisfiability solving will be given in chapter 2.2.

(9)

algo-rithms for solving SAT problems, so a problem only needs to be translated into a SAT problem, instead of designing an algorithm for the specific problem in or-der to solve it [Biere et al., 2009]. The algorithm is commonly used for solving problems like Sudoku or the N queens problem [Bright et al., 2019]. While SAT is not commonly used for solving many-to-many matching problems, it has been used successfully for solving one-to-one matching problems [Drummond et al., 2015]. In this thesis, the possibility of using SAT solving for many-to-many matching will be tested.

1.3 SAT for the reviewers/submissions problem

In chapter three, the approach of this research will be described in detail. Yet, here will be given a short overview of the steps that are taken in order to solve the research question of this thesis;

How to find high quality matchings for many-to-many two-sided matching problems using satisfiability solving?

The answer will mainly be sought using python; using the z3 library in specific. Z3 is a theorem prover from Microsoft Research that can be used for sat solving, among others [Bjørner and de Moura, 2019]. The SAT solver from z3 will be used to find matches between reviewers and submissions from a data set from preflib [Mattei and Walsh, 2013]. In this data set, reviewers indicate to what extent they want to review each of the available submissions. Using satisfiability solving, a distribution of articles between reviewers will be made. To get the most desirable matches, there will be made combinations of different cardinalities and weights for the degrees of preference of reviewers.

While SAT is not known for solving this problem, it should be able to solve it since the problem can be expressed using logical rules. These rules can be fed to SAT, which will then come up with a distribution of variables which should represent all the matches that have been made. Moreover, the problem that has to be solved is an NP-complete problem like SAT itself. In short, this means that it should be possible to convert this problem into any other NP-complete problem like SAT and should therefore be solvable by SAT. However, a more detailed explanation will be given in chapter 2.2.2.

(10)

Chapter 2 Theoretical framework

In this chapter an overview will be given of matching problems and satisfiability solving in general. There will be some more in-depth information about two-sided many-to-many matching problems under preference and satisfiability solving used to solve matching problems.

2.1 Matching problems

A matching problem is the problem of assigning agents to other agents. Agents can be people, but might also be institutions or objects. While solving a matching problem may seem a simple process at first, the difficulty is to take into account the preferences of agents. Additionally, there might also exist other restrictions in the form of cardinality and variation in the number of groups of agents. In this thesis however, there will only be looked at two-sided matching problems where agents of two disjoint groups are matched with one another.

Many real-world problems of two-sided matching have hundreds or thousands of participating agents. There are a lot of examples where this is the case; match-ing students to projects, patients to hospitals and matchmatch-ing men and women on dating websites. Because these matchings usually have significant consequences for people, it is important to get the best matches possible. On the other hand, because of large numbers of agents, it can be very difficult and time-consuming to find these. Moreover, it is not known what the best matches possible are. How good a matching solution is, is partly subjective and therefore difficult to deter-mine. In matching problems with two-sided preference however, there is a good measure to see how good a match is, namely the stability of a match. In the stable marriage problem for example, men and women that partake in the matching have a preference as to whom they are matched with. In such a matching problem with two-sided preferences, stable matches can be formed. A stable match is a match

(11)

where no two of the agents in a match would prefer another agent over their current matched agent [Abraham, 2003]. Stability of matches is seen as the main criterion for determining the quality of a matching problem solution [Manlove, 2013]. In matching problems with one-sided preference, like with the house allocation problem, stability is not relevant. People have preferences over houses, but houses have no preference. Because this problem only has one-sided preference, stable matches cannot be formed . Consequently, other criteria have to be set up in order to rank matches. In section 2.1.4, there will be more information on deter-mining the quality of matches for matching problems with one-sided preference.

2.1.1 Stable Marriage problem

The Stable Marriage problem is an old example of a two-sided matching problem. It is a one-to-one matching problem with two-sided preference. In this type of problem, the goal is to find a stable match between all agents of two equally sized sets. Moreover, every agent has an order of preference over all agents in the other set.

To get a stable match between two agents, at most one of the agents in a match may prefer another agent over the matched agent. When both agents in a match prefer another agent over the agent they are matched with, a match is unstable. When an unstable match exists within a solution of a stable marriage problem, the solution does not solve the problem. While this problem originally describes the matching of grooms and brides, it can be applied to multiple real-world scenarios. One of the most well-known examples is the matching of graduating medical stu-dents to hospitals. Furthermore, the stable stable marriage problem was applied to the assignation of internet users to internet servers within a large internet ser-vice. Gale and Shapley found that it is always possible to form stable marriages from lists of preferences [Gale and Shapley, 1962]. The algorithm they designed is very useful; by following the steps of the algorithm, one will always solve a stable marriage problem. This solution can be found in linear time; if the size of a set of agents doubles, the running time of the algorithm doubles as well [Manlove, 2013]. This is a desirable time complexity since a lot of algorithms have a higher running time with quadratic time complexity at best.

2.1.2 Hospitals/residents problem

Gale and Shapley also defined the hospitals/residents problem, originally as the college admission problem [Gale and Shapley, 1962]. This is a one-to-many two-sided matching problem with two-two-sided preference. This problem follows the same

(12)

rules as the stable marriage problem, with the exception that one set of agents can be matched to multiple agents on the other side. This also means that the sets of agents don’t have to be the same size, which, in practice, they rarely are. The algorithm for solving this problem is roughly the same as the stable marriage algorithm and should always be able to find a stable matching in linear time.

2.1.3 House Allocation problem

The House Allocation problem is an example of a matching problem with two groups of agents where one of these groups has a preference over the other group. When assigning people to housing spaces, these people are asked to give their order of preference over the available houses. Each person can only be matched with one house. The House Allocation problem describes a two-sided one-to-one matching problem with one-sided preference. Because the preference is one-sided, stable matches cannot be used as criterion for ranking matches. Instead, criteria like Pareto optimality. A match is Pareto optimal when a person (A) in a match does not prefer another house over the assigned house and there is no person (B) that prefers the house that the other person (A) was matched with over its own assigned house [Manlove, 2013]. This means that a match is Pareto optimal if the match cannot be optimised (by assigning a house that is preferred over the current house) without assigning a house to someone else that is preferred less over their current house. There are several more ways of ranking matches; for example maximum utility, popularity and profile-based optimality. However, just like Pareto optimality these are only useful in case there is a strict preference order. These methods are not suitable for ranking matches resulting from the reviewers/submissions problem since this problem has no strict preference order, but rather bids for each paper [Kavitha, 2020].

2.1.4 Finding high quality matches for one-sided preference

The stability of a match is a significant measure for the quality of a match, it is the main criterion for ranking matching problem solutions [Manlove, 2013]. Un-fortunately, stable matches can only be formed under two-sided preferences. Prob-lems like the House Allocation or reviewers/submissions problem have one-sided preference. This means another criterion for ranking matches should be used. Possible alternatives that seem promising are social welfare and equity or fair-ness [Kimbrough and Kuo, 2010]. Social welfare is the sum of the agent scores in matches and equity is the sum of the difference in preference scores in matches. Both of these measures can be used for many-to-many matching problems with one-sided preference. They provide a clear picture of how many matches are de-sired and also show the proportion between desirable and undesirable matches.

(13)

Total welfare shows the total happiness of all agents while equity shows the differ-ence in happiness between agents. It is desirable to get a high social welfare and a low equity.

Although it is not always easy to determine whether a match is desirable, this is not very important for computing social welfare and equity. These criteria just describe some properties of the model of matches that was found. The desirability of a match is usually subjective, but in case a preference is expressed using either yes or no, this is quite obvious. In case of a maybe bid, it depends on the nature of the data and the preference of the one who organizes the matching.

Another important factor in matching algorithms is the running time of the algo-rithm. While the time to compute a model does not say anything about the quality of a model, it is important to limit the time to compute a model. Especially when working with large data sets, using a model with exponential time can result in a computing time of hundreds of hours, which is very unpractical and should there-fore be avoided [Alekhnovich et al., 2006]. For this reason, it is important to take the computing time of an algorithm into account when evaluating it.

2.2 Satisfiability solving

Satisfiability or SAT solving is a technique where a problem is expressed with Boolean logical rules to solve it. Boolean logic is a type of logic where variables can only have either of two values; true or false. These variables can form logical rules or expressions when they are combined with conjunctions, disjunctions, nega-tions and parentheses. For most SAT solving algorithms, it is a requirement for a problem to be expressed in CNF. A Boolean expression is in Conjunctive Normal Form when it is expressed only using conjuctions of disjunctions [Schuler, 2005]. Once a problem is translated into CNF rules, the Boolean variables in the rules will all be set to either true or false in a way where the total formula evaluates to true. If such an assignation of values is possible, a problem is satisfiable. Problems are not always satisfiable, in such a case there is no way to assign values to the Boolean variables that also leads to a formula that evaluates to true.

SAT has its origins in logic, graph theory and theoretical computer science and is quite often used in computer science and computer engineering [Biere et al., 2009]. This shows the versatility of satisfiability solving. It is used in many fields and combines these fields, which makes it so useful for solving real-world problems.

(14)

2.2.1 SAT algorithms

There are multiple examples of SAT algorithms. The foundation of almost all SAT algorithms is the DPLL-algorithm, named after its creators (Davis, Putnam, Lo-gemann and Loveland) [Sinz, 2007]. The algorithm can find solutions for problems in CNF using backtracking, by using a branching algorithm. Values of variables are chosen, but when a variable turns out to have the wrong value, the algorithm will revert back to a previous state [Nieuwenhuis et al., 2006]. An algorithm that was inspired by DPLL is the conflict driven clause learning (CDCL) algorithm. This algorithm creates clauses to identify assignments responsible for conflicts. The values for variables that have proven to form a problem will be avoided for these variables, which boosts the efficiency of the solving.

Another more recent SAT solver, based on CDCL, is MiniSAT [Eén and Sörensson, 2003]. This is a very efficient open-source SAT solver, that can easily be used to solve SAT problems within relatively little time [De Cat et al., 2014]. Another effi-cient and state of the art, open-source algorithm that is also based on CDCL and will be used in this thesis is the z3 theorem prover from Microsoft Research [Bjørner and Nachmanson, 2020]. A big advantage of this SAT solver is the flex-ibility in terms of input. Many SAT solvers require the input to be in CNF, but the z3 theorem prover can work with many forms of data.

2.2.2 NP-complete

The first problem that was ever proven to be NP-complete, is SAT [Ogheneovo, 2020]. When a problem is NP-complete, it means that this problem can only be solved in polynomial time using a non-deterministic method. An algorithm can run in polynomial time when its time complexity is at most quadratic (O(t2_{)). When}

a problem has an exponential running time (O(xt)), it is usually not solvable within reasonable time, especially when working with big sets of data. A non-deterministic method is a method that can have different results while the same input has been given, like in figure 2.1. When a problem is NP-complete, like SAT, it also means that any NP problem can be translated into it in polynomial time [Ogheneovo, 2020]. A problem is in NP when it can be, not exclusively, solved in polynomial time using a non-deterministic method.

Non-deterministic methods are not fit for solving problems in practice, however. Yet, because many SAT algorithms are optimised, most problems can in many cases still be solved within reasonable time.

(15)

Figure 2.1: Deterministic vs. Non-deterministic.

2.2.3 SAT for matching problems

Many SAT algorithms, when a solution exists, can always solve the problem in time O(n2_). _{Using computers, this means that these problems can always be}

solved within reasonable time, even when there are many variables. For many-to-many two-sided matching problems under preference, efficient algorithms also exist. However, it is challenging to find stable matches in many-to-many match-ing problems. Because of this, very often, compromises should be made. It can nonetheless be very time consuming to find a satisfying solution, and this is why satisfiability solving will be used for finding solutions.

SAT solving can take time O(2n_{), but this is a worst-case scenario. There are}

efficient SAT solvers that can usually solve problems in nearly O(n2). This is why it is interesting to apply satisfiability solving to many-to-many matching problems under preference. As well as the efficient algorithms, rules can easily be set up and changed. This is very useful since a match is not always stable, but there is still a difference in the quality of matches, and the quality should be as high as possible.

(16)

Chapter 3 Approach

This chapter will provide an overview of the process of solving the reviewers/submissions matching problem using SAT solving. There will be information about the data, the algorithm and the collection and interpretation of the results.

3.1 The data

The five data sets that were used in this experiment, were published by preflib [Mattei and Walsh, 2013]. Three of these sets were taken from Computer Science Conferences (files 2.1, 2.2 and 2.3) and two were taken from the Autonomous Agents and Multiagent Systems Conference in 2015 and 2016 (files 4.1 and 4.2). The five sets of data contained a number of reviewers, a number of papers and a bidding of the reviewers over almost all of the submissions. Reviewers were not allowed to review papers they had been working on themselves. The data was pre-processed to make the five data sets similar so they could all be used in this experiment in a similar way. The reviewers in the data gave their preference for the reviewing of all non-conflicting papers. All papers with which they had no conflict, were either marked with a yes, maybe or no. The data and the pre-processing resulted in five csv files, containing reviewers, submissions and bids, on which this experiment was conducted.

In figure 3.1 is given an overview of the five files containing the data and figure 3.2 shows information about the percentage of yes bids in the files. There are two files containing relatively few reviewers and submissions and three files containing a high number of reviewers and submissions. The ratio of reviewers to submissions varies between 0.33 and 0.83 over the five data sets. The number of reviewers and submissions greatly influences the file size, as each reviewer gives a bid on roughly each of the submissions. By looking at the files 2.2 and 4.1 in specific, one can see that the number of reviewers and the number of submissions is roughly ten times

(17)

greater in file 4.1 compared to file 2.2. While file 2.2 contains 1.150 data points, file 4.1 contains 122.570 data points, which is approximately 100 times as much.

Figure 3.1: Some properties of the data files.

Figure 3.2: Number and percentage of yes bids in the data sets.

Here is some more specific info on the five csv files that contained the data: data set 2.1 contains 31 reviewers and 54 submissions. This means there is a ratio of 0.57 for rev/sub, so a proportion of 1 : 1.74. It contains 1629 rows, so 45 potential matches are ruled out. Roughly ten percent of all bids are yes bids. data set 2.2 contains 24 reviewers and 52 submissions, so there is a ratio of 0.46 or 1:2.17. It contains 1150 rows which means that 98 matches are ruled out. Almost 18% of the bids are a yes bid.

data set 2.3 contains 146 reviewers and 176 submissions. The ratio is 0.83 or 1:1.21 and 133 of the 25696 possible matches are ruled out. The file contains about one percent of yes bids.

data set 4.1 contains 201 reviewers and 613 submissions. It has a ratio of 0.33 or 1:3.05 and 640 matches are ruled out, so 122570 rows remain.

data set 4.2 contains 162 reviewers and 442 submissions, so a ratio of 0.36 or 1:2.73 and contains 71022 matches with 582 matches ruled out. This file also contains about one percent of yes bids.

(18)

3.2 The experiment

The experiment was conducted using python, with the z3 library from Microsoft Research [Bjørner and de Moura, 2019]. Yet, before anything could be done in python, the reviewers/submissions matching problem had to be converted into a SAT problem.

3.2.1 Logical rules

To convert the reviewers/submissions problem into a SAT problem, the matching problem had to be translated into logical rules. The first step was the creation of variables that combined reviewers and submissions. These variables would express matches that could be made. The variables would be false by default, but when a match was made, the variable consisting of the reviewer and submissions between who a match was formed, would be set to true. But besides expressing the problem using logic, the problem also had to be solved and its solution had to be optimised. To get a model with matches that is desirable and to be able to optimise the re-sults, there should be variables that can be changed. The first variable was the ratio of bid scores. The bids that were expressed using yes, maybe and no had to be converted into numbers so the algorithm could work with them. But of course, there are many options for the distribution. In this experiment was chosen for the following distributions for yes/maybe/no: 90/10/0, 80/20/0, 70/30/0, 60/40/0. After looking at the results, some other distributions like 85/14/1, 90/9/1 and 95/4/1 were added. Many bid scores were used and experimented with, in order to see how the results would change for different variable values.

Taking into account the factors that were stated above, there are two types of rules; if-rules and less-or-equal-rules. The if-rules are built as following:

There are three values for the rules, one Boolean variable and two scores. Us-ing this construct, one of both scores is picked, dependUs-ing on the value of the variable. When a match is made, the variable of the corresponding couple would be set to true and the first score would be selected. When a match would not be chosen, the second score would be chosen. In the case of a 70/30/0 distribution for yes/maybe/no, the rules would look something like this:

Yes: If(reviewer0submission0, 70, 0) Maybe: If(reviewer0submission0, 30, 0) No: If(reviewer0submission0, 0, 0)

(19)

The other type of rule consists of a list of Boolean variables followed by the number one, where this list is closed with an n, which describes the cardinality. Because the cardinality should be enforced on both reviewers and papers, each reviewer-submission pair is notated twice. One to ensure a reviewer does not get too many papers to review, and one to make sure a paper does not get too many reviews. The algorithm can then pick a maximum of n variables from each list. Each variable has the same weight, therefore they all have a one next to them.

For six submissions, the rules would look something like this for each reviewer, where rev0 denotes reviewer zero and sub0 denotes submission zero:

PbLe([(rev0sub0,1), (rev0sub1,1), (rev0sub2,1), (rev0sub3,1), (rev0sub4,1), (rev0sub5,1)], n)

For six reviewers, the first submissions rule would look something like this:

PbLe([(sub0rev0,1), (sub0rev1,1), (sub0rev2,1), (sub0rev3,1), (sub0rev4,1), (sub0rev5,1)], n)

In the experiment, multiple values for n have been used. First, values ranging from one to ten were tested. After this, the cardinality was splitted. The n would now represent the maximum number of submissions per reviewer and an m would represent the maximum number of reviews per submission. This was done because it is desirable to have a lot of reviews per submissions for higher overall review quality but little submission per reviewer to limit the amount of time and effort for reviewers. Depending on the data set, either the n or the m limits the total number of matches. For a data set with five reviewers and ten submissions and an n of two and m of five, one can see that with at most two submissions per reviewer, it is not possible to get five reviews per submission; this would require 25 reviewers. In cases with more reviewers than submissions, this works the other way around. Because it is desirable to have many reviews per submissions and a low number of submissions per reviewer, there was chosen for a low n and a high m in combination with the given data sets. For example: a (n,m) of (3,7) would in most cases provide the same result as (3,10) or (3,20). Therefore, in many tables, only the n is displayed as this was the limiting factor. A tenfold increase of m would not influence the result in any way.

In most cases, the n was given a value of at least two and at most five. This seemed a reasonable range of submissions for a reviewer to review, while still leaving some space for experimentation. With these data sets, choosing for one reviewer per submission would lead to some submissions not being reviewed at all. On the other hand, assigning more than five submissions to a single reviewer would most likely be at the expense of quality. Reviewers for conferences usually have a

(20)

limited time to read the submissions and review them while also continuing with their usual jobs or research. Assigning too many submissions to a reviewer would take up too much of their time, which could lead to them not having enough time for writing a review of high quality.

3.2.2 The programming

The csv files that contained the data were imported using the pandas library. To create the PbLe rules for reviewers and for submissions, the data had to be sorted twice; once from the reviewer standpoint and once for the submissions. Next, the variables had to be created by combining the reviewer and submission number. The bids had to be translated into various numbers, depending on the chosen ratio. These two were then combined into a dictionary for easy access of the scores that were linked to the corresponding matches. Since some of the data files contained a couple of hundred reviewers or submissions, a lot of rules had to be created. These rules had to be formed automatically, but also had to fit perfectly into the desired syntax for the algorithm. The only possibility to make the process semi-automatic, was by using the print function to print the rules. The rules were not allowed to contain characters like ’ or ", which was very difficult to do without printing them. Because of the form of the rules, python could not set them as variables. They had to be seen as strings. The problem here was that python encloses every string like this: ’string’ or "string". Yet, this shape was not recognized by z3. Because of this, the output had to be printed without apostrophes. As a result, the printed outputs had to be copy-pasted manually as input for the z3 algorithm. While this is not too problematic, it meant that the algorithm was not very flexible, because the rules had to be printed and inputted again for each new file.

3.2.3 Optimization

When creating a model, it is best if each submissions gets lots of reviews. With more reviews, a clearer image of the quality of the article can be created and the overall ranking will most likely be less biased and trustworthier. Yet, it is not desirable to assign lots of submissions to each reviewer, since reviewing a scientific paper takes quite some time and effort. Because of this conflict, it is important that different cardinalities are tested and compared to get the best compromise. After that, the variable values can be changed accordingly to improve the results. Besides getting a desired number of submissions per reviewer with an acceptable number of reviews per submissions, the main criterion for ranking a model will be the social welfare. This social welfare is the total score, which is the sum of all bid scores corresponding to the matches in the model. The equity of the models will

(21)

also be measured by taking the sum of the difference in score between the matches.

To get a clear picture of the data, some trivial score distributions and car-dinalities were used, namely: n ranging from one to ten and the yes/maybe/no distributions 60/40/0, 70/30/0, 80/20/0, 90/10/0 and 100/0/0. After looking at the corresponding results, some more data set-specific research was done. For the files 2.1 and 2.2, many combinations of bid scores were tested. The files 2.3, 4.1 and 4.2 were usually too big to handle in an acceptable amount of time, but this will be discussed later.

(22)

3.3 Algorithm output

First, a score was calculated, namely the social welfare. The algorithm used this score to optimise its result. The score is the total of all bid scores that are used in matches. For example; if a ratio of 70/30/0 is chosen to score yes/maybe/no, a match with the bid yes will add 70 to the score and a maybe bid will add 30, while a no will not raise the score. The total score is the most important criterion to rank the matchmaking, because it represents the matches that are made and what bids are linked to those matches, while the score is also used to optimise the model. Also, because the bid scores can be given various values, the resulting match model that is created is flexible and can be optimised for a single data set. Secondly, the running time of the algorithm was measured using the python timeit library. The running time gives a clear image of the usability of specific variables for the algorithm for a data set, especially when combined with the data set size. This will help to clarify the image because the algorithms takes longer to create a model for larger data sets. The resulting running times are all measured just once, so may vary when running the algorithm again. It would be better to run the algorithm a number of times and to then take the average running time. This would increase the accuracy of the resulting time a lot. There are two main reasons why this was not done in this experiment. First, some of the files with some variable settings took hours to run. It was impossible to run some of the code multiple times, because there simply was not enough time for this to be completed within the time that was set for this project to be completed. A second reason this was not done, not even for the code that took little time, was because the running times functioned mostly as an indication of what could be expected. Their purpose was to show the order of running time, rather than the exact time that running the algorithm would take on average. Yet, it might be worth to note that there was tried to keep all other factors that could impact the running time equal. This for example means that only one file would be ran at a time. Moreover, all the running times were computed using the same computer with an Intel i7 core. Finally, the number of matches with a yes, maybe and no were counted each time. By using this approach, one can more clearly see how the score was built up. The resulting time, social welfare (score) and distribution of yes, maybe and no matches were used to calculate some other properties. The equity was calculated by adding up the difference in score between matches. This was done using the following formula (where s represents the assigned score of the corresponding bid): nmaybe∗ (syes− smaybe) + nno∗ (syes− sno) + nno∗ (smaybe− sno).

The average time, social welfare and equity were also calculated by dividing the running time of the algorithm, the computed social welfare and the equity by the number of submissions per reviewer. This was done to get a clear image of the development when assigning more submissions to individual reviewers.

(23)

Chapter 4 Results

In this section, the results of the experiment will be discussed. First, some general results are shown. After these results, some more in-depth and data set-specific data will be shown. Finally, some results for all data combined will be shown.

4.1 First results

Because of big file sizes, a lot of decisions have to be made before the SAT algo-rithm comes with a final model. Because of this and because of some unknown factors, some results took a very long time to obtain. While SAT algorithms can be very useful, a downside is that it is unknown to the user, what exactly happens during calculation. Therefore, it was unclear why this black box algorithm would take a long time to come up with some of the models while it was able to produce others so fast. What could be seen early on, is that it is worthwhile to split the number of submissions per reviewer and the number of reviewers per submission. At first, there was chosen for one number to represent both of these cardinalities. This heavily impacted the running time, which is shown in figure 4.1. For this reason, there was chosen to split the cardinality into an n and m, which was done for the remaining research.

(24)

Figure 4.1: Results for data set 2.2 with an equal cardinality for n and m.

Two things stood out when running some of the data. The first thing that stood out was the difference in running time between bid ratio’s with and without points awarded to no-bids. Having a bid score ratio of 90/10/0 would lead to a result much quicker than a 90/9/1 bid score ratio. (This will be seen later on, in figure 4.8 and 4.10.) Also, assigning no score to maybe bids decreases the running time of the algorithm(, which can be seen in figure 4.11). Secondly, it seemed that the difference between bid scores does not really impact the model with matches, as long as the scores are not equal and larger than zero. For example, a bid ratio of 95/4/1 results in the same matches as a 60/30/10 bid score ratio for the files 2.1 and 2.2 as can be seen in figure 4.2. This is also visible in figures 4.8 and 4.10. (Although the number of matches for these two bit ratio’s is the same, the running times were much higher for both the 60/30/10 models with 917 and 1613 seconds compared to the 95/4/1 models with 30 and 34 seconds). Both of these observations will be discussed in chapter 5.

(25)

Figure 4.2: Files 2.1 and 2.2, 95/4/1 and 60/30/10 bid ratio’s in matches for n=3.

To rank the matches found by the SAT algorithm, three basic values were taken into account; the score, the running time and the ratio of yes/maybe/no matches. Using these three criteria, the equity, average running time, average social welfare and average equity were calculated. With all the values and the data set infor-mation, the models created by the algorithm with the given variable values were ranked. First was done some experimenting on the data to get a better picture of it and to see how the results could be optimised. After this, there was some exper-imentation on each individual data set, because different data sets with different sizes and properties require a different approach. This data set-specific research is shown in section 4.2.

Again, in figure 4.3, the properties of the data are shown. These properties are important to compare with the results because the results are connected with these properties. The number of rows in a file influences the running time and the reviewer/submission ratio influences the social welfare score. It is also important to take the percentage of yes bids out of the total amount of bids into account. For a file with a higher relative number of yes bids, it should be easier to find better

(26)

matches. These numbers can again be seen in figure 4.4.

Figure 4.3: The five csv files with their properties.

Figure 4.4: Number and percentage of yes bids in the data sets.

In figure 4.5, one can see clearly how the running time increases when using a larger data set to compute a model. The number of rows in a file is roughly the number of reviewers multiplied by the number of submissions, with the exception of some ruled out possible matches.

(27)

In figure 4.6 can be seen how different files react to an equal bid score ratio and cardinality. While the average social welfare for the file 2.1 drops with a higher number of submissions per reviewer, the average social welfare for the file 2.2 stays practically the same. This can be explained by looking at the reviewer/submission ratio’s for file 2.1 and 2.2 in figure 4.3. File 2.1 has a ratio of 0.57 while file 2.2 has a ratio of 0.46. Because file 2.2 has relatively more reviewers than file 2.1, the submissions can be distributed more easily. It is desirable to have many re-views per submission but little submissions per reviewer. In a file with a higher reviewer/submission ratio, this is easier to accomplish. For both files, n is the lim-iting factor, so the number of submissions per reviewer. Yet, in file 2.2, it is easier to get an even average number of reviews per submissions as for file 2.1 while yet having a lower average number of submissions per reviewer. This shows how the file ratio can influence the average social welfare.

Figure 4.7 shows the average running time of the algorithm. As mentioned earlier, the running time can be quite unpredictable. While file 2.1 shows quite a steady average running time, file 2.2 shows a growth in average running time for n ranging from two to four, but for five submissions per reviewer, the average running time decreases. It is unclear why this happens, this is a disadvantage of working with black box algorithms like SAT solvers.

(28)

Figure 4.7: Files 2.1 and 2.2, 90/9/1, average time (time/n) with m=10.

4.2 Specific research

As stated before, the time, score and number of matches with certain bids will be collected. The measured time, social welfare and equity will also be divided by n, in order to see the average score and time per reviewer. The n is the number of submissions per reviewer (S/R), the number of reviewers per submissions was set to ten. With this m, the delimiting factor is always the number of submissions per reviewer. It does not matter if the m would be a bit lower or much higher, because the reviewers are in all cases completely saturated while there are no ten reviews on all papers. Setting m high does introduce the risk of some papers getting more reviews than others. However, multiple attempts of using a lower m did not have a different result. Only when m becomes the limiting factor, the outcome is changed. Therefore, there was chosen to set m to ten for all the following results.

4.2.1 Data set 2.1

For data set 2.1, it was relatively easy to find high quality matches for a low number of submissions per reviewer. For example, of the 62 submissions that were reviewed for a cardinality of two, only two were assessed with a maybe bid while the other 60 had a yes bid. This means there is almost no room for optimization. As can

(29)

also be seen in figure 4.8, n=3 also had a high percentage of yes bids. While the number of submissions per reviewer increased, the percentage of yes bids decreased. With this increase, the equity also increased. Especially for five submissions per reviewer, the average equity would increase a lot. This was mainly true when a high score was assigned to yes bids. While the social welfare would increase, the equity would too. It really depends on personal preference what proportion is best between these two values. A compromise is always necessary for this data with a higher n. However, when one chooses to assign two submissions to reviewers, the social welfare is relatively high and the equity relatively low. However, this would lead to most submissions getting only one review. To get at least two reviews on average per submission, a cardinality of at least four is necessary.

The running time for this data set was acceptable overall, ranging from roughly one to 160 seconds. The assignation of zero points to matches without bids greatly boosts the running time and yet barely decreases the social welfare and on top of that lowers the equity. The only real downside to this is that the total number of matches decreases. Yet, in this case, only a few matches are lost. Assigning zero points to maybe matches would lead to a loss of 3.3%-17.6% of the matches and thus reviews, based on the cardinality.

It is also interesting to see that different bid score ratio’s do not impact the number of yes, maybe and no matches for this data set. This phenomenon will be discussed in chapter five and can be seen in figure 4.9.

(30)

Figure 4.9: File 2.1, social welfare for 2 submissions per reviewer with different bid ratio’s.

4.2.2 Data set 2.2

For this data set, the phenomenon shown in figure 4.9 also holds. This can be seen in figure 4.10 by looking at the bid ratio. While adding a score to no bids does change the number of no bids, changing the ratio between scores does not change the outcome of the algorithm.

While the reviewer/submissions ratio in this file is a bit lower than in data set 2.1, the percentage of matches with a yes bid is high for this file. There were little maybe and no bids, even with five submissions per reviewer and this lead to a high social welfare and low equity. A high score for matches with a yes bid performed better than lower scores. Running the algorithm with a yes/maybe/no ratio of 100/0/0 lead to 110 reviewed submissions, all with a yes bid. This does result in a loss of ten submissions but also results in a maximum social welfare and equity of zero, even with a relatively high number of submissions per reviewer. Furthermore, a result with n=5 was found in 0.7 seconds, which is very quick. When looking at the 90/9/1 ratio in figure 4.10 for cardinality five, one can see that there are still five reviews missing, so a ratio of 100/0/0 actually only results in a loss of five instead of ten reviews.

(31)

reviewed at all. A cardinality of at least five is required to give the average sub-mission at least two reviews.

Figure 4.10: File 2.2, results for 2-5 submissions per reviewer.

4.2.3 Data set 2.3

This file contained a lot more rows than 2.1 and 2.2. This resulted in very high average running time of the algorithm. However, the results that were found can be seen in figure 4.11. The x in the table for time indicates that the algorithm has been running for quite some time (at least one hour) but did not manage to come up with a complete model within this time and was therefore shut down. As can be seen, there are no results for a bid score ratio that assigns points to matches with no bids. An attempt with n=2 with a ratio of 95/4/1 took over four hours and was then shut down because no result was generated. For this data set, it took relatively long to compute any result, but the impact of assigning points to maybe bids or no bids is very big in terms of computing time. Yet, the impact of cardinality on the running time is even bigger. One would expect a result from n=4 with a 100/0/0 ratio within a couple of minutes, since only matches with a yes bid need to be made and this could be done for n=3 in under a minute. However, this was not the case. Instead, the algorithm kept running for over an hour without result. While it almost definitely would have generated a model in the end, it is unclear how much time this would have taken and this is very unpractical.

Even though the results are lean, a cardinality of two seems to work fine for this data set. The algorithm was able to find 288 matches with a ratio of 90/10/0. This means there are only four matches missing, which could probably have been complemented with matches with a no bid, but this would take a long time for the algorithm. For n=2, 92.5% of the total possible number of matches was assessed with a yes bid. When also taking the maybe bids into account, 98.6% of the possible matches were made using a 90/10/0 ratio. This means almost no reviews are lost. Because of the high percentage of matches with yes bids, the equity can be kept low overall for a cardinality of two.

(32)

For a cardinality of three, only ten reviews are missing. Of the possible total number of matches, 88.8% of matches are assessed with a yes bid. For the 90/10/0 ratio, another 8.9% of matches are assessed with a maybe bid. This gives a total coverage of 97.7% of the possible number of matches. This is very high and so it might not be necessary to try to get a 100% coverage, as this clearly takes much more computing time for the algorithm with little possibility for improvement. While the results are promising, the algorithm took very long to get results for a cardinality of four and five so it is unclear if there would still be a high percentage of matches with yes bids here. Yet, the average submission gets at least one review for a cardinality of two and at least two submissions for a cardinality of three.

Figure 4.11: File 2.3, results for 2-5 submissions per reviewer.

4.2.4 Data set 4.1 and 4.2

Because of the huge size of these data sets, little results have been generated, hence these results will be discussed here together. It could be seen that data set 2.3 had some trouble with generating results in an acceptable amount of time. Data set 4.1 is almost five and data set 4.2 is almost three times the size of data set 2.3. This means it is plausible to assume that the running time will likely have at least a similar increase.

Figure 4.12 shows the results for data set 4.1 and 4.2. The first two rows show results of the algorithm, the second two rows are calculated results. The numbers that were used for the 90/10/0 ratio are based on the results from the algorithm output for 70/30/0. Such a calculation of results can be justified because the pre-vious results showed that for different ratios, the number of yes, maybe and no bids stayed the same under different scores, as long as they were larger than zero and not set equally. Therefore, there seems to be no reason to assume this would be any different for these two files. Because of the long running time, there was chosen for this approach. Because the average number of reviews per submissions for both files is around one, this would be lower for a cardinality of two and there-fore be relatively useless. It would however be interesting to see results for a higher cardinality. However, because of the time it took to compute a model for n=3 and the common big increase in running time from n=3 to n=4 for other files, there

(33)

was chosen not to made an attempt to get results for a cardinality higher than three. This was done because it would most likely have taken a much longer time than would be practical.

In figure 4.12 can be seen that for the file 4.1, only four matches out of the possible 603 are missing. This results in a coverage of over 99% of matches, which justifies the omission of matches with a no bid. Adding these matches would have most likely taken very long while having little result. Setting the ratio to 90/10/0 instead of 70/30/0 would, according to the calculations, lead to an increase of almost 20% in social welfare. However, this would also cause the equity to roughly double. It should be noted that acquiring the model with matches took over 24 hours with these settings.

For data set 4.2, the running time was significantly lower as results were ac-quired in under an hour. Yet, it is questionable whether this can be considered a practical time for an algorithm to run. For this file, it was possible for the algo-rithm to assign all reviewers a submission that they assessed with a yes or a maybe bid for a cardinality of three. This means that in this case, assigning a score to no bids would have no impact on any of the results except the running time. Again, when using a 90/10/0 ratio instead of a 70/30/0 ratio, the social welfare would not even increase by 20% while the equity would double.

(34)

4.2.5 Data sets combined

It is also interesting to combine the results of all files, so the results of the files with equal variables can be compared. The only combination of variables that was tested on all files is a bid score ratio of 90/10/0 and a cardinality of 3, so n,m = 3,10. In figure 4.13, the properties of the data sets were combined with their results with the just mentioned variable values. Figure 4.14 shows the relation be-tween the ratio bebe-tween the number of reviewers and submissions and the average number of matches per submission. There is a clear relation; less submissions per reviewer is linked to a higher number of reviews per submission. This also makes sense intuitively but it is still useful to verify this, so the results are interpreted correctly.

Figure 4.13: Results for all files with a 90/10/0 ratio and cardinality of three.

Figure 4.14: Relation between reviewer/submission ratio and matches/submission ratio for a cardinality of three and a bid score ratio of 90/10/0.

(35)

Figure 4.15 shows the relation between the approximate number of rows and the running time of the algorithm. It is a little difficult to tell the time com-plexity of the algorithm based purely on this result. However, when looking at running times of similar experiments, the results start making a lot more sense [Balint et al., 2015]. In order to make a good comparison between these results and those of the SAT solving results for the reviewers/submissions problem, fig-ure 4.16 shows the number of matches in relation to the algorithm running time. Figure 4.17 shows some of the results of the SAT challenge 2012 solver compe-tition. Figure 4.15 and 4.16 on themselves are not enough to prove the running time shows quadratic complexity. Yet, a stronger proof for this statement emerges with the combination of the results from the competition with the results in figure 4.16, since the SAT algorithms in the competition also had a quadratic running time [Balint et al., 2015].

Figure 4.15: Relation between the number of rows of a file and the algorithm running time for a cardinality of three and a bid score ratio of 90/10/0.

(36)

Figure 4.16: Relation between the number of matches and algorithm running time for a cardinality of three and a bid score ratio of 90/10/0.

Figure 4.17: One of the results of the SAT challenge 2012 solver competition, relation between instances and running time.

(37)

Figure 4.18 shows the relation between the ratio of reviews/submissions and average social welfare. It is difficult to see a clear relation between the two. Figure 4.19 shows a similar comparison, but instead of the average social welfare, the average equity is used. Again, no clear relationship can be discovered.

Figure 4.18: Relation between reviewers/submissions and average social welfare for a cardinality of three and a bid score ratio of 90/10/0.

Figure 4.19: Relation between reviewers/submissions and average equity for a cardinality of three and a bid score ratio of 90/10/0.

(38)

In figure 20, a correlation between the ratio of yes bids in a data set and the ratio of matches with a yes bid can be seen. While the correlation is not very strong, a link between the two percentages can be seen. This again shows how the properties of a data set can influence the resulting model and its properties that SAT produces.

Figure 4.20: Relation between the percentage of yes bids in matches and the percentage of yes bids in files.

(39)

Chapter 5 Conclusion

To answer the main research question of this thesis; How to find high quality match-ings for many-to-many two-sided matching problems using satisfiability solving?, it is important to take several factors into account.

For example, the possibility of finding high quality matchings depends on the data set. Data set 4.1 had a much higher ratio of matches with a yes bid compared to data set 4.2. This difference cannot be explained with the results that are cur-rently available. While at first, it was deemed possible that this difference might be linked to a difference in the ratio of yes bids in the files, this turned out to not be the case. As can be seen in figure 4.4, data set 4.2 has a higher percentage of yes bids than file 4.1. It may be possible that file 4.2 contains some submissions that were disproportionately popular which caused many yes bids to overlap for file 4.2. It might also be possible that for this dataset, better results could be found using different variable values or a different SAT algorithm, but this is uncertain. Anyhow, more research is necessary to find a cause for this difference.

It is interesting to see that changing bid score ratio’s does not impact the bid ratio of the matches when a few rules hold. First, the bid scores should stay either at zero or above zero and should not be set equal to each other. Secondly, the following should always be true: scoreyes ≥ scoremaybe ≥ scoreno. When looking

more into the optimisation method of the algorithm, the matches not being af-fected by the bid scores actually makes sense intuitively. Namely, the algorithm tries to optimise the social welfare score. When deciding what matches to make, a match with a yes bid will always be chosen over a maybe bid because the score of a match with a yes bid is higher. The difference between the two does not matter to the algorithm. The only exception here would be a case where a single match with a yes bid unavoidably results in several matches with no bids where a maybe bid would result in various other matches with maybe bids. However, for this data, there were usually very many yes bids, quite little maybe bids and little to

(40)

no matches with no bids. Therefore, this situation is not applicable to this data. This means that the score bid ratio only impacts the equity.

For a model, it is best if the equity is minimized. This can be done by taking a bid score ratio that is similar to the bid ratio of matches. However, this would most likely not impact the matches that are made, since the algorithm uses social welfare to optimise its model. While equity can be an important measure, for this experiment it mostly showed the height of the percentage of matches with a yes bid. When the percentage of matches with a yes bid is higher, this means that less matches with maybe and possibly with no bids exist and therefore there is less equity. However, in this way this is not a very meaningful measure since the number of matches with yes, maybe and no bids is also counted.

While it is important to get high quality matchings, it is also important that re-viewers and writers of submissions agree with the matches that have been made. While this might be the case regarding the bids, the number of submissions per reviewer and number of reviewers per submission should also be accepted. As men-tioned earlier, it is not desirable to assign many submissions to a single reviewer. On the other hand, as can be seen in the figure 4.13, files that contain a high number of submissions compared to reviewers can be problematic in this regard. Especially for the data sets 4.1 and 4.2, there is approximately one review per submission while the average reviewer has to review three submissions. In these cases it is difficult to find a good balance between the cardinality, in this case the number of submissions per reviewer, and the number of reviews per submission. For a more balanced data set in terms of reviewer/submission ratio, it is easier to find a desirable number of reviews per submission while on the other hand limit the number of submissions that should be reviewed per reviewer.

For data set 2.1, it is quite difficult to say what approach should be taken for finding high quality matchings. While a lower cardinality comes with a higher percentage of yes bids, the number of reviews per submission is also lower. It is probably worthwhile to assign a score to maybe bids since they make up a sub-stantial part of the matching when a score above zero is assigned to them. A score for no bids on the other hand seems much less necessary. While it really depends on personal preference whether a score should be assigned to matches with a no bid, they add little to no matches overall. Other than that, different ratios do not seem to impact the matches that are made. Therefore, to get high quality matchings for this file, it is probably best to at least use a score above zero to yes and maybe matches. Setting a score above zero for no bids is optional, and the number of reviews per submission really depends on personal preference. A lower number comes with a higher match quality and lower running time but a higher number comes with a higher coverage per submission.

(41)

For data set 2.2, it seems best to use a 100/0/0 bid score ratio as this reduces the running time substantially while it barely impacts the review coverage of ar-ticles. It is probably best for this file to use a high cardinality of at least five, to ensure each submission gets at least two reviews on average. Yet, a reviewer would probably not appreciate an assignation of more than five submissions. Therefore, in this case a 100/0/0 ratio with a cardinality of five seems best.

For data set 2.3, it is a bit more difficult to decide whether a score larger than zero should be used for maybe bids. What makes it even more difficult is that it cannot be said what the results are for a cardinality above three, because of very high running times. However, for three submissions per reviewer, the average number of reviews per submission is above two, due to the ratio between reviewers and submissions. This means that it might be a good idea to not assign a score above zero to maybe bids. The downside to this would be that it leads to more inequality between reviewers and submissions in terms of the number of reviews that results from the model.

For the files 4.1 and 4.2, it is very difficult to get high quality matches. This is mainly due to the high running times which can be attributed to the large file sizes. It seems that a cardinality of three is insufficient for these data sets, because a single review per submission on average seems too little. Even more so, some submissions are not being reviewed at all. For these data sets, SAT solving is probably not the best option because the running times for the algorithm are too high to be considered practical. While the percentage of yes bids is quite high, other algorithms might be able to find similar results in less time.

From the results, it becomes clear that SAT solving can be useful for solving the reviewers/submissions matching problem. However, the running time of the algorithm is impacted by larger data sets a lot. Even with a small data set however, some variables can cause the algorithm to take multiple hours to find a solution, as is shown in figure 4.1. However, when the cardinalities are split and given certain values and the reviewer/submission ratio is not too unbalanced, the algorithm can usually produced high quality matchings in a relatively short time for smaller data sets. When ignoring the high running time, results for large data sets are quite promising. Yet, high running times are a fact, which is unpractical in many cases, and therefore SAT seems unfit overall for making matches for files that contain a lot of bidding rows.

(42)

Chapter 6 Future research

While the results of the SAT approach of solving the reviewers/submissions prob-lem are not always great, some of the results are quite promising. Therefore, it might be worth to look into the running time of the algorithm. If the running time could be predicted more accurately beforehand, one might be able to decide whether SAT is a satisfying approach based on the data that is collected. Running the algorithm on more data sets with more varying numbers of reviewers and sub-missions would likely give a better indication of the running time but could also possibly give a better picture on the relation between the reviewer/submission ra-tio and the (average) social welfare and equity. There currently cannot be seen a clear relation between the ratio and social welfare and the ratio and equity, but this is based on the results of five data sets. A pattern might emerge when more results of different data sets are gathered.

It might also be worthwhile to see whether other SAT solvers, like MiniSAT, show similar running times and similar results. It could be the case that other SAT al-gorithms produce results of similar quality within a shorter time, but this cannot be said in advance.

Also, it might be interesting to take a different approach to the algorithm’s opti-misation method. This might be done in several ways. One way could be to use negative bid scores combined with a minimum instead of maximum cardinality. This way, the algorithm would try to keep the total cost as low as possible by choosing matches that would deduct the least amount of points. However, this may give a similar result as the algorithm optimisation would probably work the same except inverted, so the resulting matches could be the same in many cases. Yet, this is not certain and might be interesting to look into because this approach makes more sense from the reviewers perspective since the number of matches per reviewer should be kept as low as possible without hurting the scores and number of reviews per match too much.

(43)

Another possible approach, is using equity as measure for optimisation. This might be interesting because with this method, it might be easier to find a good mid-dle ground between the number of reviews per submission and the percentage of matches with yes bids.

Something that has not been looked into in this thesis is strategyproofness. A model is considered to be strategyproof when no agent or group of agents is able to change the matches that are made in their own advantage through the declaration of false preferences [Barberà et al., 2010]. It is possible that some of the models that were created in this experiment are strategyproof, especially the models with a very high percentage of yes bids. Yet, there should be looked into strategyproof-ness for the models that were created to verify or falsify this. Despite it not being looked into in this research, strategyproofness might be an important factor to take into account. When looking into strategyproofness, it is probably equally impor-tant to look at fairness [Narang et al., 2020]. As one might see, strategyproofness also affects fairness and these factors should therefore both be taken into account in future research into one of both.

Another interesting possibility that might also help when trying to achieve strat-egyproofness and fairness is to look into using K-means on the data. With this cluster algorithm, it might be possible to distinguish groups of reviewers with their own area of expertise. This information could then be used to possibly change bid scores for individuals, in case a bid for a submission deviates from bids of other re-viewers for the same submission. While this might be risky, since it might increase the number of matches with no bids, it could also increase the overall quality of reviews, strategyproofness and fairness. A maybe bid of a reviewer might lean more towards a yes, which might be detected by the algorithm and then be chosen over a bid that leans more towards no. The bid scores for maybe bids could namely be increased or decreased depending on bids of reviewers with similar knowledge. The usage of K-means might even be taken a step further. When looking at bids from certain groups of reviewers, the research area of submissions might be even distinguishable. This might then be used to create a classification or pref-erence for submission for certain groups of reviewers. This would change the re-viewers/submissions matching problem into a many-to-many two-sided matching problem with two-sided preference instead of one-sided preference. An advantage could be that this creates new possibilities for determining the quality of matches.

(44)

Bibliography

[Abraham, 2003] Abraham, D. J. (2003). Algorithmics of two-sided matching prob-lems. PhD thesis, Citeseer.

[Alekhnovich et al., 2006] Alekhnovich, M., Hirsch, E. A., and Itsykson, D. (2006). Exponential lower bounds for the running time of dpll algorithms on satisfiable formulas. In SAT 2005, pages 51–72. Springer.

[Balint et al., 2015] Balint, A., Belov, A., Järvisalo, M., and Sinz, C. (2015). Overview and analysis of the sat challenge 2012 solver competition. Artificial Intelligence, 223:120–155.

[Barberà et al., 2010] Barberà, S., Berga, D., and Moreno, B. (2010). Individual versus group strategy-proofness: When do they coincide? Journal of Economic Theory, 145(5):1648 – 1674.

[Biere et al., 2009] Biere, A., Heule, M., and van Maaren, H. (2009). Handbook of satisfiability, volume 185. IOS press.

[Bjørner and de Moura, 2019] Bjørner, N. and de Moura, L. (2019). The inner magic behind the z3 theorem prover.

[Bjørner and Nachmanson, 2020] Bjørner, N. and Nachmanson, L. (2020). Nav-igating the universe of z3 theory solvers. In Brazilian Symposium on Formal Methods, pages 8–24. Springer.

[Brandt et al., 2016] Brandt, F., Conitzer, V., Endriss, U., Lang, J., and Pro-caccia, A. D. (2016). Handbook of computational social choice. Cambridge University Press.

[Bright et al., 2019] Bright, C., Gerhard, J., Kotsireas, I., and Ganesh, V. (2019). Effective problem solving using sat solvers. In Maple Conference, pages 205–219. Springer.

Satisfiability solving for matching problems

Satisfiability solving

for matching problems

Satisfiability solving

for matching problems

using SAT to solve many-to-many matching problems

Abstract

Contents

Chapter 1

Introduction

1.1

Many-to-many matching

1.2

Satisfiability solving

1.3

SAT for the reviewers/submissions problem

Chapter 2

Theoretical framework

2.1

Matching problems

2.1.1

Stable Marriage problem

2.1.2

Hospitals/residents problem

2.1.3

House Allocation problem

2.1.4

Finding high quality matches for one-sided preference

2.2

Satisfiability solving

2.2.1

SAT algorithms

2.2.2

NP-complete

2.2.3

SAT for matching problems

Chapter 3

Approach

3.1

The data

3.2

The experiment

3.2.1

Logical rules

3.2.2

The programming

3.2.3

Optimization

3.3

Algorithm output

Chapter 4

Results

4.1

First results

4.2

Specific research

4.2.1

Data set 2.1

4.2.2

Data set 2.2

4.2.3

Data set 2.3

4.2.4

Data set 4.1 and 4.2

4.2.5

Data sets combined

Chapter 5

Conclusion

Chapter 6

Future research

Bibliography