Development of a converter for rule-base representations.

(1)

Development of a converter

for rule-base representations

(2)

Layout: typeset by the author using LATEX.

(3)

Development of a converter

for rule-base representations

and the integration of heuristics to infer priorities

Max R.A. van den Heuvel 11844442

Bachelor thesis Credits: 18 EC

Bachelor Kunstmatige Intelligentie

University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisor Dr. G. Sileno Informatics Institute Faculty of Science University of Amsterdam Science Park 907 1098 XG Amsterdam Jun 26th, 2020

(4)

Abstract

In law, priority across normative directives is determined following various prin-ciples. This priority determination method can also be applied on rule bases containing logic rules. This paper presents a Python implementation of existing conversion algorithms between constraint-based and priority-based rule base rep-resentations, by means of the Quine-McCluskey Boolean simplification algorithm. Exploiting this conversion, this work proposes a method for computing the priority determination for the rules in given rule bases.

(5)

Chapter 1 Introduction

Law is commonly known as a system of rules enforced by the government to regu-late conduct (Robertson n.d.). In law a precedent is a principle or rule established in previous legal cases which can be used as a reason or example for a similar action or decision at a different time (Steinberg n.d.). A precedent rule can be applied when the previous case uses the same principles as the subsequent case to evaluate the issue. Additionally, a legal precedent can be distinguished if some principles of the previous decision are absent or different from the subsequent case. Finally, a legal precedent is said to be overruled if a recent decision, decides against the previously made decision or action. This can occur when the antecedent decisions become erroneous due to new developments or legislation. In this research this revision in law will be generalized for a generic intelligent agent with knowledge described by a rule base.

The algorithms used in precedential reasoning in law can be used for the analysis and conversion of a rule base. This research is aimed at the development and direct computational implementation of these algorithms for rule bases. Two different types of representations for a rule base are the priority-based and the constraint-based representation. These rule base representations can be explained in terms of two well known control-flow constructs.

The first control flow construct being a conditional statement, better known as the if/else statement. This expression performs different actions based on the programmer-specified Boolean conditions (Conditional (Computer Programming) n.d.). A recursive series of conditional statements can be used to nest conditions during programming. The second construct is known as the switch/case operator or switch statement. This construct is a method of allowing variables or expressions to change the flow of the program execution (Switch Statement n.d.). A series of cases in a switch can be used to flatten the conditions during programming.

(8)

a rule base where a strictly ordered hierarchy is defined like in a nested series of if/else statements. There is no hierarchy defined in the constraint-based representation which can instead be associated with a switch statement.

1.1 Research Question

John F. 2011 and Sileno, Boer, and Van Engers 2015 have studied and developed algorithms inspired by precedential reasoning for analysing rule bases and the im-pact of the introduction of new rules to these rule bases. This project concerns the implementation and possible optimization of these four existing algorithms that convert between a priority-based rule base and a constraint-based rule base repre-sentation, based on Boolean simplification (e.g. Quine–McCluskey algorithm). As a novel contribution, the project will elaborate, develop and implement the inte-gration of several types of heuristics—based on position, order, or generality—to infer priority from the rule-base.

These priority heuristics can also be found in law. The position principle is formalized in a law stating that newer laws tend to overwrite older laws. In addition, in law the the generality principle states that more specific laws overwrite more general laws. In law and daily life these heuristics reoccur leading to the importance of innovating these heuristics in rule bases.

1.2 Overview

Just after the present introduction, presents section two the theoretical foundation of the used tools and algorithms. Section three explains the implementation of the research structured by the information outlined in section two. In section four, the final results are given, followed by evaluation and interpretation in section five. Furthermore, in this section the research gets concluded, and the paper ends with covering open issues and suggestions for future work.

(9)

Chapter 2 Theoretical Foundation

This section presents the main concepts, models, and methods applied during this project. It provides the required information to understand and carry out the research.

2.1 Rule Bases

A rule base is a set of rules which can be ordered to achieve priority determination. The premise implies the conclusion, meaning when the premise is true, the con-clusion is implied to be true also. This view is known as the declarative or logical view on rules. A rule consists of a premise followed by a conclusion, which are both formulas. In this project, a formula consists of literals in combination with operators, making Boolean functions. A literal is an atomic proposition, which can be either true or false, and an operator can be either AND or OR.

The premise of a rule can be subsumed in or shared amongst other rules. Two rules can be conflicting when their conclusions are opposites. When this applies, the problem of determining the right conclusion occurs.

A possible solution, knowing the correct conclusion in certain conditions, would be rewriting or modifying the conflicting rules at horizontal level by modifying their premises for those conditions. Another solution could be defining a hierarchy between the rules, determining a priority of application at vertical level. These approaches are directly related to different representations of the rule bases, namely the constraint-based and priority-based rule bases respectively (Sileno, Boer, and Van Engers 2015).

(10)

2.1.1 Constraint-based

A constraint-based (CB) rule base is a non-ordered set, meaning the position of the rules are irrelevant. A rule in a constraint-based representation can be converted to explicitly contain every relevant factor, when this gets applied to a rule base, it becomes a full-tabular rule base. In a full-tabular rule base every rule contains information about every relevant literal. This redundancy of a full-tabular CB (FTCB) rule base can be reduced to an intermediate CB (ICB) representation via Boolean simplification (Sileno, Boer, and Van Engers 2015). A switch case is an example of a CB representation, which allows for execution of code blocks depending on conditions. Every condition needs to be coded explicitly for a switch case to be functional.

2.1.2 Priority-based

A rule base defined by a strictly ordered hierarchy is called a priority-based (PB) rule base. The ordering can be temporal or positional, the former is for example found in dialogical interactions while the latter is found in texts. A PB rule base uses the composition of the rules to reduce redundancy and increase compactness. In this type of rule bases the conjunction of already established factors can be re-moved, thus making the rule evaluation more efficient. However, the application of PB rules might require more computational power than CB rules as the evaluation has to be performed following the priority ordering which might be sub-optimal (ibid.). In coding a priority-based representation can be seen as an if-else state-ment, higher priority or more likely cases are checked before other cases to ensure efficiency.

2.2 Priority Determination

When a situation occurs where two conclusions are in conflict, a possible hierarchy can be determined based on the priority of the rules at a vertical level. In the following situation (a classic in the non-monotononic literature (Pearl 1990)) the rules state a conflict, thus a priority determination is needed:

flies <- bird. bird <- penguin. -flies <- penguin.

First it is mentioned (in a simplified propositional form) that birds fly, followed by that a penguin is a bird but does not fly. Which is contradicting our previous statement that birds fly. Intuitively, the rule that a penguin does not fly has a

(11)

higher priority than the fact that birds fly. This can be achieved in a rule base by assigning an order to the rules, checking the generality of the rules, or inspecting the temporal creation of the rules.

2.2.1 Spatial

Priority determination can be achieved by specifying an order for rules or groups of rules in a rule base. In this case a higher order will lead to a higher priority whereas a lower order will give the rules a lower priority, known in law as lex posterior derogat legi prior (Fellmeth and Horwitz 2009a). In the previous example, the fact that a penguin does not fly would have the highest order and the fact that birds fly the lowest. This can be achieved in a rule base by for example having the highest priority rules at the bottom of the rule base and the lowest priority at the top.

2.2.2 General

In most cases more specific rules have a higher priority than more general rules, this occurs in law with the principle lex specialis derogat lego generali (Fellmeth and Horwitz 2009b). When a formula is subsumed by another formula, the other formula is more specific and thus should have, according to this principle, a higher priority. The more specific rules introduce priority constraints on the conclusion. In the penguin example, a rule stating that a penguin is a bird and is not able to fly overrules the fact that birds can fly since it is more specific. This principle can be achieved by determining whether a rule is subsumed by other rules.

2.2.3 Temporal

A third method for determining priorities uses the recency of temporal creations -in other words, a more recent rule has a higher priority than a less recent rule. In law this is also known as lex posterior derogat legi priori (Fellmeth and Horwitz 2009a). This can be achieved in a similar way to the spatial method, where priority is determined using the order of the rules, but in the horizontal dimension.

2.3 Parsing

In order to treat rule-bases in a systematic fashion, it was decided to implement a parser that takes in input textual files with a specific format. The parser has been designed using ANTLR.

(12)

ANTLR stands for ANother Tool for Language Recognition, it is a tool that allows the user to generate parsers and lexer for designed languages. A parser builds a data structure, often in the form of a parse tree. In this parse tree the syntactic or structural relation between tokens is represented while checking for the correct format of the input. The tokenization of the input is done by the lexer. (ANTLR n.d.). The grammar contains the lexer and parser rules, which are analyzed in the order they are written in and can be ambiguous (Tomassetti n.d.)1_.

The ANTLR library provides two different mechanisms for the construction of the parse tree and their walkers, a listener and a visitor mechanism. Both walkers use recursive depth first traversal but the listener methods are called automatically by the ANTLR walker object, meaning all the nodes of the tree will be visited, whereas the visitor methods are not called automatically by the walker. Another difference is that a visitor method can have a return value of any type when exiting or entering a node while the listener method can not return anything. Furthermore, the visitor uses the call stack to store data during the tree traversals whereas the listener uses an explicit stack on the heap (Srivastav 2017).

When ANTLR constructs a walker for the parse tree using a listener mecha-nism, it also constructs a class with functions for entering and exiting each parser rule. This class can be inherited by a subclass in which the enter and exit func-tions from the walking mechanism can be overwritten. In this paper this subclass is called a loader which allows for the construction of data structures.

2.4 Quine McCluskey Algorithm

Two of the rule base conversion algorithms developed in this project require a Boolean simplification algorithm (from intermediate CB rule bases to PB rule bases and from full-tabular CB to intermediate CB rule bases).

Quine McCluskey (QM) (Quine 1955) is one of the most used Boolean sim-plification algorithms. It uses the determination of essential prime implicants to simplify and minimize Boolean functions. The running time of the algorithm grows exponentially with the number of variables2_.

An implicant is a conjunction of literals, connected with the AND operator, which imply a conclusion. More importantly, prime implicants are implicants that

1_{Most of the syntax from ANTLR3 still work in ANTLR4, the newer version. ANTLR3}

syntax can be found here https://theantlrguy.atlassian.net/wiki/spaces/ANTLR3/pages/ 2687036/ANTLR+Cheat+Sheet

2_{Faster algorithms using Neural Networks and cross correlation exist, suitable for high variable}

Boolean functions (El-Bakry and Mastorakis n.d.) but require more research to be implemented correctly

(13)

cannot be covered by a more general implicant, they are fully reduced. To find the prime implicants the inputs that evaluate to one (or true), known as minterms, are written in their binary form where each binary position refers to a variable being either positive or negative (Implicant n.d.).

Formula Minterm Binary

-a and -b 0 00

-a and b 1 01

a and -b 2 10

a and b 3 11

The first step is grouping the minterms with equal amounts of ones in their binary form. Afterwards the minterms are compared within the groups and the places where the binary terms only differ a single digit are replaced with a dash to indicate that the truth value of that variable is irrelevant. The combined implicants are then compared with the other combined implicants and the steps are repeated until no reduction is possible. This will result in minimal implicatns which are then compared to find the essential prime implicants3 _{(Quine 1955).}

For example minimizing the formulas -a and b and a and b with minterms 1 and 3 and the binary values 01 and 11 differ at the first variable. Replacing that with a dash results in the binary value -1. Rewriting this to the formula results in b which is what we would expect when reducing these formulas.

3_{An online implementation by Suman Adhikari can be found here:} _{https://repl.it/}

@SumanAdhikari/Quine-McCluskey-Algorithm#main.py, the minterms need to be entered and QM will be applied.

(14)

Chapter 3 Method

The conversion algorithms need to be developed and applied on a rule base. In order to achieve this, a rule base needs to be designed. In order to represent logic rules in the rule base, a grammar, parser, and relevant data structure are required. On this data structure, the implemented conversion algorithms can be applied. Afterwards the priority heuristics can be implemented so that they can be used in the conversion algorithms.

3.1 Designing a grammar for rule bases

When starting the project, the first attempt for specifying rules bases was made using basic readers and processors to handle text files and strings of rules. The text files contained the rules which got transferred into tuples containing a the rule, premise, conclusion, and operators. This was quickly replaced by ANTLR which generates the parser,lexer, and parse tree. ANTLR allows for more effi-cient performance and already checks for the correct syntax of the grammar when creating the parse tree.

Two other version of a grammar were implemented which can be found in the appendix (A and B). They were later replaced by the Tagma grammar.

Tagma Grammar

The Tagma grammar is the final version of the grammar designed for the rules. The labels used in previous grammars were removed their exit functions were not utilised in the loader and they did not add any needed utilities when creating a data structure. On top of that, the decision was made to remove the possibility to have an operator in the conclusions. This was done due to the difficulty of

(15)

checking whether two formulas are equal or in conflict. There are many config-urations of formulas containing operators which lead to the same or conflicting conclusions, while a conclusion consisting of only a literal is either conflicting, equal, or irrelevant.

The implemented grammar; 1 g r a m m a r T a g m a ; 2 3 /* P a r s e r R u l e s */ 4 5 p r o g : e x p r + EOF ; 6 e x p r : c o n c l u s i o n I M P L I E S p r e m i s e end ; 7 8 c o n c l u s i o n : l i t e r a l ; 9 10 p r e m i s e : lp = L B R A p r e m i s e rp = R B R A 11 | p r e m i s e AND p r e m i s e 12 | p r e m i s e OR p r e m i s e 13 | l i t e r a l 14 ; 15 16 l i t e r a l : NEG ID 17 | ID 18 ; 19 20 end : DOT ; 21 22 /* L e x e r R u l e s */ 23 24 AND : ’ and ’ ; 25 OR : ’ or ’ ; 26 I M P L I E S : ’ < - ’ ; 27 NEG : ’ - ’; 28 L B R A : ’ ( ’ ; 29 R B R A : ’ ) ’ ; 30 DOT : ’ . ’ ; 31 ID : [ a - z ] | [ A - Z ]; 32 WS : [ \ t \ r \ n ]+ -> s k i p ; 33 ANY : . ;

There are lexer rules for each possible character in the rules, with the imply character being a left-pointing arrow. The ID rule matches the atom of a literal, being either an uppercase or lowercase character. The second-last lexer rule ensures skipping line termination, tabs and whitespaces. The final lexer rule is for catching any non-defined characters. If multiple lexer rules match, ANTLR chooses the first lexer rule based on the given order. This order of the grammar ensures for example the classification of the imply character before the negations.

(16)

rule base. The premise is recursively defined to simplify the grammar and the conclusion is just a literal. The brackets in the premise are labeled within the rule to simplify checking the existence of them. ANTLR looks at the parser rules, and the choices within, from top to bottom. The base case for the premise will always be a literal and rules are always terminated using a dot.

The following rules are examples of the syntax of the created grammar: • p <- a.

• -p <- -a. • p <- a and b. • p <- a or b.

3.2 Loader

For this research the listener method was chosen over the visitor method to ensure the visit of each node. There is also no need for a return value, the loader will only be used to build a data structure. Another reason for this decision was the potential of rule bases become really big. This might risk a StackOverFlow when using the visitor method. To create a data structure, a loader was implemented which overwrites the exit listener functions for the parser rules in the grammar.

With the creation of the Tagma grammar the loader was also updated, it overwrites the exit functions of the walker for the expression, conclusion, premise, and literal. While exiting the nodes of the tree, the loader decorates it with objects of the correct type. Decorating means assigning objects with the matching values to a context in a dictionary. When exiting the node for a literal, the context of a literal get added as key in a dictionary with a value of the literal object containing the atom and the negation.

There are three layouts in the parse tree possible when exiting the premise node, as defined in the grammar. If the premise is a literal, the literal object gets used as decoration for the premise context. Otherwise, if the premise contains an operator, a formula object, including the left and right terms and the operator, is constructed and used to decorate the tree. The final situation is where the premise is surrounded by brackets. In this case the premise has already been added to the decorations since ANTLR walks through the tree depth first. This premise object then gets used as value for the premise context.

Since it has been established that the conclusion can only be a literal, the already existing literal object gets used as decorations for the conclusion context. A rule object gets made using the conclusion and premise objects from the decoration

(17)

tree when exiting an expression. This rule object is then used as decoration for the rule context and added to a list containing all the rules from the input. While walking over the parse tree, a list is filled with the rules represented as objects.

3.3 Data Structure

Doing operations on strings is inefficient and problems arise when trying to track literals or change existing rules leading to the implementation of the data structure. This data structure prevents doing operations on the text level and only at the computational object level. Classes for the operators, literals, formulas, and rules were implemented to create the data structure objects.

The class defining the operators is an enumerator class containing unique con-stant values of the used operators. An enumerator is utilised to introduce the operators and to avoid the use of strings.

The class constructing literal object initializes with an atom and the state of negation of it, True for negative literals, False for positive literals. The equal sign was overwritten for literal objects, used in unit testing among other things. The hash function was replaced so literal objects could be used as a key in a dictionary which is utilized in the implementation of the FTCB to ICB conversion. The greater than character was also overwritten to be able to sort literals based on alphabetical order, used for the QM algorithm. Additionally a function to negate a literal and to check for equal atoms was created.

A similar approach was taken for the formula class. A formula object is ini-tialized with a list containing the left and right term, and an operator combining these terms. The terms can be either formula or literal objects, as defined by the grammar. The comparison expression was overwritten to work for formula objects. In addition, functions were implemented to negate operators and formu-las. The formulas can be negated using the negation of an operator and literal. A Boolean function was made to check if literals are specified within a formula, and a function returning list of literals in a given formula which are unspecified in the object formula. Both of these functions are utilized in the ICB to FTCB conversion. Functions were made to construct the formula objects with the left and right term, and operator as input, which is used for creating the data structure in the loader. The formula can also be sorted based on alphabetical order used as preparation for the QM algorithm.

The data structure class creates the rule objects. A rule object is build up from the conclusion and premise objects. The final list that gets returned by the loader contains all the rule objects of the rule base. These rule objects are then used in the conversion of representations.

(18)

3.4 Quine McCluskey Tools

For the Boolean simplification algorithm a library implementation of the QM al-gorithm by Thomas Pircher was used (Pircher 2019). This implementation was chosen because it has no inherent limits on the size of the input and is known to be considerably faster than other public Python implementations, but still expo-nential, as mentioned in the read-me. The function takes a list of integers which represent the minterms, explained earlier, as input and returns a set of strings which represent the minimized minterms. Functions were made to convert the formulas to their respective minterms, together with tools to convert the set of strings back to data structure objects. To simplify the conversion to the minterm representation, the formulas can be build in alphabetical order. These functions prepared the formulas for the Boolean simplification.

3.5 Conversion Algorithms

For the conversion method from one representation to another, four algorithms have been used. These algorithms are based on the paper by Sileno, Boer, and Van Engers 2015.

The following is an example of the four algorithms that have been implemented to convert between the different types of representations of rule bases. Take the following rule base where the lowest rule has the highest priority:

• p <- a. • -p <- b.

This rule base has a priority-based representation and can be converted using the implemented algorithm into a minimal constraint-based representation becom-ing:

• p <- a and -b. • -p <- b.

Here the rule -p <- b. has the highest priority. This rule base can be converted further, using the implemented algorithm, into a full-tabular constraint-based rep-resentation. Doing this the rule base becomes:

• p <- a and -b. • -p <- a and b.

(19)

• -p <- -a and b.

In this representation each rule is expanded with the relevant unspecified factors in the rule base. There is no information about the implication of the rule -a -and -b, thus it is not included in the tabular representation. The full-tabular representation can then be converted back to the minimal representation using the Boolean simplification algorithm Quine McCluskey. The minimal CB representation can also be converted back to the PB representation using the implemented algorithm. The algorithms used will now be explained further.

3.5.1 From Priority-based to Intermediate Constraint-based

Suppose a strictly ordered set of rules M= {r1, r2, ..., rn} where rn has the

high-est priority and r1 has the lowest priority. The conversion into an intermediate

constraint-based representation can be achieved using the following algorithm: Algorithm 1Priority-based to Intermediate Constraint-based

1: for each ri ∈M, from rn to r1 do

2: if newRuleList == ∅ then

3: add ri to the newRuleList

4: for each rj ∈ newRuleList, from rn to r1 do

5: if Conclusion(ri) == Conclusion(rj) then

6: newRule = Conclusion(rj) ← P remise(ri)W P remise(rj)

7: if Conclusion(ri) = ¬Conclusion(rj) then

8: newRule = Conclusion(rj) ← ¬P remise(ri)V P remise(rj)

9: add newRule to newRuleList

Each rule either gets combined when conclusions are equal or the complement of the premise of a rule with higher priority gets added when conclusion are in conflict. This converison is made by John F. 2011 and can produce cloned rules when the conclusions of two rules are equal and one premise is subsumed by the other. These cloned rules can be removed without harm (Sileno, Boer, and Van Engers 2015).

3.5.2 From Intermediate to Full-tabular Constraint-based

For the conversion to full-tabular, the unspecified (not already in the rule) relevant factors (literals from premises with conflicting or equal conclusions) need to be established and added to the relevant rules. The following algorithm transforms the rules to full-tabular:

(20)

Algorithm 2Intermediate to Full-tabular Constraint-based

2: for each rj ∈ newRuleList, from rn to r1 do

3: if Conclusion(ri) = ¬Conclusion(rj) then

4: get unspecified literals from ri and rj

5: for literal ∈unspecified literals ri and rj do

6: build sorted formula using literal 7: build sorted formula using ¬literal

8: add sorted formulas to newRuleList

9: else

10: add newRule to newRuleList

The rules with highest priority get compared to rules with lower priority. If their conclusions are in conflict, each unspecified literals and their complement from each rule get added to the premises of both rules. A rule grows with the power of two for each unspecified literal when converting into full-tabular (Sileno, Boer, and Van Engers 2015).

3.5.3 From Full-tabular to Intermediate Constraint-based

For the conversion from FTCB to ICB, first the rules which share the same conclu-sion got combined using the OR operator. Afterwards the formulas got converted into their minterm representation and reduced to their minimal form using the Quine McCluskey algorithm (ibid.). To simplify the conversion into the minterm representation, the conversion into full-tabular ensured the literals were in alpha-betical order. The reduced minterms then get converted back from a set of strings into the data structure objects to represent the formulas.

3.5.4 From Intermediate Constraint-based to Priority-based

The final algorithm converts a rule base in minimal constraint-based representation to a priority-based representation. For the conversion priority ordering is needed, this can be provided as input or obtained by evaluation. In the following algorithm the priority ordering is already applied with rn having the highest priority:

(21)

Algorithm 3Intermediate Constraint-based to Priority-based

2: θ = Conclusion(ri)

3: f actors = GetRelevantF actors(M, θ) 4: if notnotY etEvalSituations then

5: notY etEvalSituations = GetRelevantSituations(f actors)

6: if notestablishedF actors then

7: establishedF actors = ∅

8: ruleBasei = ConvertT oF ullT abular(ri, f actors)

9: ruleBasej = ∅

10: for each rj ∈ ruleBasei do

11: if P remise(rj) ∈ notY etEvalSituations then

12: newP remise = P remise(rj); apply = true

13: for each f ∈ establishedF actors do

14: if ¬f ∈ P remise(rj) then

15: apply = f alse; break

16: else if f ∈ P remise(rj) then

17: newP remise = newP remise\f

18: if apply and newP remise then

19: newRule = θ ← newP remise

20: notEvalSituations = notEvalSituations\P remise(rj)

21: establisedF actors = extractF acts(notEvalSituations) 22: apply QM on ruleBasei obtaining reduced the rule base

The algorithm creates groups of rules with relevant conclusions. Withing these

groups the redundancy gets removed using Boolean simplification. [GetRelevantF actors(M , θ)]finds all the relevant literals or factors with respect to the given conclusion and

the whole rule base. The function [GetRelevantSituations(.)] uses these factors and allocates truth values to these factors to create all possible relevant situations. After converting the highest priority rule to full-tabular and constructing the new premise based on the established factors, the established facts get removed from the not yet evaluated situations using [extractF acts(.)]. Finally QM gets applied on the group of rules, obtaining the minimal representation.

3.6 Priority Determination of Rules

The implementation of the priority determination is the novel contribution of the project. After the implementation of the conversion algorithms, the following determination methods were developed.

(22)

The implementation of priority in this research is done by the order of the rules, the last rule in a rule base will always have the highest priority, rules higher in the rule base will have a lower priority. Priority determination is done by determining the order of the rules and sorting them on this order in a rule base. The priority of a rule base can be determined by assigning an order to the rules, checking the generality of the rules, or inspecting the temporal creation of the rules.

To implement priority determination based on order, multiple functions have been implemented to change the order of the rules based on their characteristics. The rule base can be sorted based on length of the rules, meaning the amount of literals and operators in the rules. If the rules are reduced to contain no re-dundancy, measuring the length is a way of identifying the specificity of the rules. The highest priority can be given to the longest or shortest rule with the length decreasing or increasing leading to lower priority respectively. A rule base can also be inverted meaning the highest priority will gain the lowest priority and vice versa. A final way of sorting rules is based on a given index, which uses a list of indices for each rule and sorts the rule base, based on these indices.

Determining priority based on generality is implemented using a dependency graph. In this graph each literal in a rule base is a node and premises are connected via edges to conclusions. The edges are then given a weight and with this weight the dependency of literals is determined. Take for example;

a <- b and c and d.

d <- e.

Here the edges of the literals b, c, and d have a weight of 1/3 towards the conclusion a. The edge between the literal e and d has a weight of 1. Combining these rules, the weight of literal e towards a can be found by combining the weights, which would be 4/3. Utilizing these weights the dependency can be found and whether a rule is subsumed by another rule. A conclusion with multiple literals assigned to it via implication, will have a combination of more specific literals with a higher weight and more generic literals with a lower weight. This way the more specific rules and more generic rules can be distinguished. By sorting the rule base on dependencies the priority can be given to more specific rules.

Determining priority based on the temporal creation can be achieved by simply adding the newest rules to the bottom of the rule base. This way the rules already in the rule base will move up and the highest priority shifts towards the newly added rule.

(23)

Chapter 4 Results

The final result of the project is an implementation in Python of the algorithms explained in the section Conversion Algorithms, which convert rule bases written in the implemented grammar syntax.

As novel contribution, several types of meta-rules and heuristics have been inte-grated to infer priority from the rule-base. The priority of the rules in the rule base can be changed using the simple priority determination based on general applica-tion condiapplica-tions and order assigned to groups of rules. The conversion algorithms and heuristics have been developed, tested using unit testing, and published on GitHub.1. The individual functions of a program were tested using unit testing.

With object-oriented programming languages like Python the units mostly con-sisted out of a class or function in a class. First the base functions were tested to ensure a correct basis for the whole program. When this basis was established, more comprehensive tests for complex functions were created. The explicit static unit tests ensure the functionality of the previously written code when adding new functions. In addition, the problems that may occur are found early during the development of a program (Hamill 2005). A standard way of writing functions is to start with the unit tests to assure the final result of the function. In total 35 unit tests were implemented.

As an illustrative example of the conversion algorithms application, take the following rule base:

1 # If it rains , the g r o u n d w i l l be wet . 2 w < - r .

3 # W h e n the s p r i n k l e r is on , the g r o u n d w i l l be wet . 4 w < - s .

5 # If t h e r e are s o m e h e a v y clouds , it w i l l get d a r k . 6 d < - hc .

1_The _{implementation} _of _the _algorithms _is _available _on _{https://github.com/}

(24)

7 # If it rains , t h e r e are c l o u d s ( if t h e r e are clouds , it may r a i n ) . 8 c < - r . 9 # H e a v y c l o u d s are c l o u d s 10 c < - hc . 11 # D u r i n g the night , it is d a r k . 12 d < - n . 13 # If it is sunny , t h e r e are no c l o u d s . 14 - c < - ss

The rule base is read as a priority-based rule base. Here the rule saying it is dark during the night has the highest priority and the rule stating that when it rains the ground is wet, has the lowest priority. The “it is sunny" condition ss limits the application of the “there are clouds" condition c which leads to the following rule base when converted to a constraint-based representation:

1 # W h e n the s p r i n k l e r is on or w h e n it rains , the g r o u n d w i l l be

wet .

2 w < - s or r .

3 # W h e n t h e r e are s o m e h e a v y c l o u d s or w h e n it rains , and it is not

sunny , t h e r e are c l o u d s .

4 c < - ( hc and - ss ) or ( r and - ss ) .

5 # D u r i n g the n i g h t or w h e n t h e r e are s o m e h e a v y clouds , it is

d a r k .

6 d < - n or hc .

7 # If it is sunny , t h e r e are no c l o u d s . 8 - c < - ss .

According to this rule base it can only be clouded if the sun it not out. If somebody tells you it is dark and wet outside, the following facts get added to the rule base, always as a priority-based:

1 # It is d a r k and wet . 2 d .

3 w .

In order to use the code developed for this project since the grammar does not support facts yet, these facts get added using a temporary condition t to all the rules.

1 # It is d a r k and wet . 2 d < - t .

3 w < - t .

Converting the newly created rule base back to constraint-based representation, the result will be different. For illustrative purposes this condition will be taken out from the following outputs, performing some additional selection to process contradictions and tautologies in the premises. Doing this the following rule base gets extracted:

(25)

1 # W h e n t h e r e are s o m e h e a v y c l o u d s or w h e n it rains , and it is not sunny , t h e r e are c l o u d s . 2 c < - ( hc and - ss ) or ( r and - ss ) . 3 # If it is sunny , t h e r e are no c l o u d s . 4 - c < - ss . 5 # It is d a r k . 6 d . 7 # The g r o u n d is wet . 8 w .

The rule stating it may be dark or wet get replaced with the facts stating it is dark and wet. Now when somebody more trustworthy (a stronger normative source) says the ground is in fact not wet, like the fact;

1 # It is not wet . 2 - w .

This fact will have a higher priority than the previously stated facts due to the difference in trust. When adding this fact to the priority-based representation, and converting it into the constraint-based representation, the following rule base gets constructed:

1 # W h e n t h e r e are s o m e h e a v y c l o u d s or w h e n it rains , and it is not

sunny , t h e r e are c l o u d s . 2 c < - ( hc and - ss ) or ( r and - ss ) 3 # If it is sunny , t h e r e are no c l o u d s . 4 - c < - ss . 5 # It is d a r k . 6 d . 7 # It is not wet . 8 - w .

Here the rule stating that the ground will be wet when the sprinkler is on or when it rains, becomes false due to the previously added rule having the highest strength. The rule stating that the ground can be wet thus disappears. The fact stating it is dark has not been argued, and is still maintained.

When reversing the order of the facts stating the condition of the soil, wet or not wet, the result of the conversion becomes similar to the previous rule base. Instead the final rule base concludes the ground is wet instead of not wet, as expected when inverting the priorities of the fact statements.

(26)

Chapter 5 Discussion

Here the limitations and flaws of the implementation will be discussed together with the evaluation of the results

The first limitation of the implementation is the fact that the grammar only works with operators in the premise, which did simplify the implementation of the conversion algorithms, since you do not have to check for equal conclusions with operators, but limits the possibilities of the algorithms.

While the progress was clear, the implementation of the conversion algorithms still needs further testing. During the project the algorithms got tested successfully with small hand-made data sets, a downside to this is the lack of configurations of rule bases. There might be some edge cases which are missed and thus not tested. An obvious solution to this is testing with bigger data sets. These rule bases can be found online, made by hand, or even generated using machine learning.

A second problem that could occur is the fact that Quine McCluskey is com-putationally expensive. It was chosen as the Boolean simplification algorithm because of its simple implementation and property of always giving the optimal solution. A possible replacement for the algorithm can be Espresso which is uses less computational power in exchange for sub-optimal solutions (Rudell 1986).

An open issue with the constraint-based to priority-based conversion, is the fact that rules containing multiple operators sometimes get used as the base case for the new priority based rule base. This leads to missing rules due to the comparison with this new rule base.

A second known issue is the lack of breaks and logs when parsing the grammar. The ANTLR parser automatically checks for the correct grammar during the pars-ing but only shows an error message when the syntax of the rule base is incorrect. Because of this, grammars with a false syntax can still be used in the conversion algorithms which leads to incorrect results. On top of that is the implementation of the literals limited to only one character and there is no real way of introducing

(27)

facts.

Another known issue is the possibility of duplicate rules after conversion. Rules can be either duplicate within the rule base or literals can be duplicate within a rule. These duplicate literals still need to be removed after the conversion, it does not affect the correctness of the results, only the readability.

(28)

Chapter 6 Conclusion and Future Work

During this thesis the implementation of four existing algorithms that convert priority-based rule bases into constraint-based rule bases and vice versa has been explored. In addition, the integration of several types of heuristics, based on order and generality, to infer priority from the rule-base is developed and implemented. The Boolean simplification was done using the Quine McCluskey algorithm. A grammar for the rule bases was created which got parsed by ANTLR. In addition, a data structure to achieve objects used during the conversion was implemented, together with utilities to use the Quine McCluskey algorithm.

Now some suggestions for the further development and improvements of this project are given.

It was decided to design a grammar where operators in the conclusion of a rule are not possible. This decision was made to give more time to other parts of the project, because it simplified the comparison of the conclusions. The grammar could be improved to work with operators within conclusions as well as premises. Copying the grammar syntax of the premises to the conclusions would ensure this improvement. Because of this expansion, the loader would need to be adjusted as well. The exit function of the premise can then be adjusted to work for conclusions with operators and brackets. Finally, the comparison of these more complicated conclusions must be implemented to check for conflicts and equals, which is used in the conversion algorithms.

The grammar can also be extended to work with multiple characters as literals together with the introduction of stating facts in the grammar. The grammar can also be extended with the possibility to add comments, to give the opportunity of providing more information in a rule base. The parser would then ignore these comments.

Another improvement would be to increase the number of tests done for debug-ging. Like mentioned in the previous section, more and bigger rule bases need to

(29)

be tested with the conversion and priority determination algorithms. These rule bases can be made by hand, machine, or found online. Doing more debugging with bigger data sets will most likely reveal the flaws of the implementation, which can be solved to optimise the code.

The implementation can also be made more user-friendly by conceiving and building an interface for using the converter. Such interface can for example allow the change of priority determination as well as choosing the conversion algorithm. The current implementation is lacking an interface meaning the code needs to be adjusted to convert the representations of the rule base.

Other logical simplifications can also be tested in future work. As discussed in section five, the current Boolean simplification algorithm QM is computation-ally expensive. Simplification algorithms like Espresso (Rudell 1986) can be used instead, this algorithm is less computationally expensive but can result in sub-optimal solutions.

As a final note, I took the opportunity during this project to start learning the tools and methods which are used as standards at production level. Unit testing was used for every implementation, and written before the actual implementation. This way the process was easily followed and measured. This additionally led to refactored code which kept simplicity and readability, for both parties involved, throughout the design process. Working at production level established clean results in the form of code, clear progress throughout the project, and thought me useful skills like creating grammars, working with ANTLR, and unit testing.

(30)

Bibliography

[1] ANTLR. https://www.antlr.org/.

[2] Hazem M. El-Bakry and Nikos Mastorakis. “A Fast Computerized Method For Automatic Simplification of Boolean Functions”. In: SYSTEMS THE-ORY AND SCIENTIFIC COMPUTATION ().

[3] Conditional (Computer Programming). https://en.wikipedia.org/wiki/ Conditional_(computer_programming). Accessed: 2020-06-23.

[4] A. Fellmeth and M. Horwitz. “Lex posterior derogat (legi) priori”. In: In Guide to Latin in International Law (2009). Accessed: 2020-06-20.

[5] A. Fellmeth and M. Horwitz. “Lex specialis derogat legi generali”. In: In Guide to Latin in International Law (2009). Accessed: 2020-06-20.

[6] Paul Hamill. “Unit Test Frameworks”. In: (2005).

[7] Implicant. https://en.wikipedia.org/wiki/Implicant. Accessed: 2020-06-20.

[8] Horty John F. “Rules and Reasons in the Theory of Precedent”. In: Legal Theory 17 (2011), pp. 1–33.

[9] Judea Pearl. “System Z: A Natural Ordering of Defaults with Tractable Applications to Nonmonotonic Reasoning”. In: Proceedings of the 3rd Con-ference on Theoretical Aspects of Reasoning about Knowledge. TARK ’90. Pacific Grove, California: Morgan Kaufmann Publishers Inc., 1990, pp. 121– 135. isbn: 1558801058.

[10] Thomas Pircher. “A python implementation of the Quine McCluskey algo-rithm”. In: (2019).

[11] Willard Van Orman Quine. “A Way to Simplify Truth Functions”. In: 62.9 (1955), pp. 627–631.

[12] Geoffrey Robertson. Crimes against humanity.

[13] Richard L. Rudell. “Multiple-Valued Logic Minimization for PLA Synthesis”. In: Memorandum No. UCB/ERL M86-65 (1986).

(31)

[14] Giovanni Sileno, Alexander Boer, and Tom Van Engers. “A Constructivist Approach to Rule Bases”. In: Leibniz Center for Law (2015).

[15] Saumitra Srivastav. Antlr4 - Visitor vs Listener Pattern. https://saumitra. me/blog/antlr4-visitor-vs-listener-pattern/. 2017.

[16] Stefan Steinberg. “Precedent”. In: Legal Information Institute (). Accessed: 2020-06-24.

[17] Switch Statement. https://en.wikipedia.org/wiki/Switch_statement. Accessed: 2020-06-23.

[18] Gabriele Tomassetti. The ANTLR Mega Tutorial. https://tomassetti. me/antlr-mega-tutorial/.

(32)

Appendix

Grammars

A

P Grammar

1 g r a m m a r p ; 2 3 /* P a r s e r R u l e s */ 4 5 p : r u l e * EOF ; 6 r u l e : p r o p o s i t i o n ’ < - ’ p r o p o s i t i o n ’ . ’ N E W L I N E ; 7 p r o p o s i t i o n : VAR ( C O M P O U N D VAR ) ? ; 8 9 /* L e x e r R u l e s */ 10 11 f r a g m e n t L O W E R C A S E : [ a - z ] ; 12 f r a g m e n t U P P E R C A S E : [ A - Z ] ; 13 14 VAR : ’ - ’? ( L O W E R C A S E | U P P E R C A S E ) + ; 15 C O M P O U N D : (’ and ’ | ’ or ’) ; 16 N E W L I N E : (’ \ r ’? ’ \ n ’ | ’ \ n ’) + ; 17 WS : [ \ t ]+ -> s k i p ;

B

Hello Grammar

1 g r a m m a r H e l l o ; 2 3 /* P a r s e r R u l e s */ 4 5 p r o g : e x p r + EOF ; 6 e x p r : c o n c l u s i o n I M P L I E S p r e m i s e end ; 7 8 c o n c l u s i o n : lc = L B R A c o n c l u s i o n rc = R B R A # b r a C o n c 9 | c o n c l u s i o n AND c o n c l u s i o n # a n d C o n c 10 | ( NEG ID | ID ) # g e n C o n c 11 ; 12

(33)

13 p r e m i s e : lp = L B R A p r e m i s e rp = R B R A # b r a P r e m 14 | p r e m i s e AND p r e m i s e # a n d P r e m 15 | p r e m i s e OR p r e m i s e # o r P r e m 16 | l i t e r a l + # g e n P r e m 17 ; 18 19 l i t e r a l : NEG ID # n e g I D 20 | ID # g e n I D 21 ; 22 23 end : DOT ; 24 25 /* L e x e r R u l e s */ 26 27 AND : ’ and ’ ; 28 OR : ’ or ’ ; 29 I M P L I E S : ’ < - ’ ; 30 NEG : ’ - ’; 31 L B R A : ’ ( ’ ; 32 R B R A : ’ ) ’ ; 33 DOT : ’ . ’ ; 34 ID : [ a - z ] | [ A - Z ]; 35 WS : [ \ t \ r \ n ]+ -> s k i p ; 36 ANY : . ;

Development of a converter for rule-base representations.

Development of a converter

for rule-base representations

Development of a converter

for rule-base representations

and the integration of heuristics to infer priorities

Contents

Chapter 1

Introduction

1.1

Research Question

1.2

Overview

Chapter 2

Theoretical Foundation

2.1

Rule Bases

2.1.1

Constraint-based

2.1.2

Priority-based

2.2

Priority Determination

2.2.1

Spatial

2.2.2

General

2.2.3

Temporal

2.3

Parsing

2.4

Quine McCluskey Algorithm

Chapter 3

Method

3.1

Designing a grammar for rule bases

Tagma Grammar

3.2

Loader

3.3

Data Structure

3.4

Quine McCluskey Tools

3.5

Conversion Algorithms

3.5.1

From Priority-based to Intermediate Constraint-based

3.5.2

From Intermediate to Full-tabular Constraint-based

3.5.3

From Full-tabular to Intermediate Constraint-based

3.5.4

From Intermediate Constraint-based to Priority-based

3.6

Priority Determination of Rules

Chapter 4

Results

Chapter 5

Discussion

Chapter 6

Conclusion and Future Work

Bibliography

Appendix

Grammars

A

P Grammar

B

Hello Grammar