LP with Flexible Grouping and Aggregates Using Modes

(1)

LP with Flexible Grouping and Aggregates Using Modes

Marcin Czenko1and Sandro Etalle1,2

1

Department of Computer Science University of Twente, The Netherlands

marcin.czenko@utwente.nl

2

Eindhoven University of Technology, The Netherlands s.etalle@tue.nl

Abstract. We propose a new grouping operator for logic programs based on the group_by operator of SQL. The novelty of our proposal lies in the use of modes, which allows us to relax some rather unpractical constraints on variable occurrences while retaining a straightforward semantics. Moreover, modes allow us to prove properties regarding groundness of computed answer substitutions and termination. The resulting class of programs enjoys a simple and intuitive semantics.

1 Introduction

In a system designed to answer queries (be it a database or a logic program) an aggregate function is designed to be carried out on the set of answers to a given query rather than on a single answer. For example, in a Datalog program containing one entry per employee, one needs aggregate functions to compute data such as the average age or salary of the employee, the number of employees etc.

Aggregate functions are useful in practice, and paramount in database systems. In-deed, the reason why we address the problem here is of a practical nature: we are de-veloping a language for trust management [10,12,16] based on (partially function-free) moded logic programming; this language is called TuLiP. To express trust management rules, TuLiP must be able to express statements such as “employee X will be granted access to confidential document Y provided that the majority of senior executives rec-ommends him”, which requires the use of aggregates.

To keep flexibility, we wanted to implement aggregates by first having a grouping goal (like findall, which reports all answers satisfying a certain goal) and then apply-ing the aggregate on the collected answer. This is slightly less efficient than havapply-ing aggregate operations integrated with grouping, but it has the advantage of flexibility, as it allows the user to define her own aggregate predicates. Now, we tried to do this in Prolog and - while it was clearly possible to implement aggregates by hacking - we discovered that the present theoretical basis of findall/3 and related predicates in LP would allow us neither to express the statements we needed nor to demonstrate the correctness of Tulip rules containing aggregates. In particular, although grouping have been extensively studied in the past (see, e.g., [19,14,8], and the Related Work section for further details), these approaches make strong assumptions on variable sharing we could not satisfy (in a grouping goal, non-aggregate variables may not be shared with

(2)

other atoms in the query, which for our purposes is too restrictive), and did not allow us to prove some crucial properties such as groundness of answers and termination.

To solve this problem, in this paper we introduce a new higher-order moded predi-cate group_set/4 that can be used as a base to compute aggregate functions. This pred-icate is basically an extension of the well-known findall/3 predpred-icate, with the addition of grouping variables and with the advantage of being moded. This allows us to lift the restriction on variable sharing we mentioned above. In addition, we show that two of the main properties one can prove for well-moded programs (groundness of answers and – under additional constraints – termination) extend in a straightforward way to programs using group_set. We argue that group_set, is more flexible and easy to use than previous approaches.

Two technical novelties of this paper are the introduction of the concept of local variables in well-moded programs, and the presence of a sort of polymorphism in the predicates: group_set does not have a fixed mode, but its mode depends on the mode of the goal being aggregated.

A word of warning: group_set is a higher order predicate, and to provide a thor-oughly consistent treatment of programs including it we would have had to extend the theory of well-moded programs to higher order programs. Doing so to prove the prop-erties of a single predicate would have been an overkill. We keep lower profile, and we show the properties we need without resorting to new theories, at the price however of a loss of elegance in the approach.

The paper is structured as follows. In Section 2 we present the notational conven-tions used in this paper. In Section 3 we state the basic facts about well-moded and well-terminating logic programs. In Section 4 we introduce simple grouping queries and show how to use them in programs that do not contain grouping subgoals. In Sec-tion 5 we show how to use grouping in programs. Here we generalise the noSec-tion of well-moded logic programs to those including grouping subgoals. In Section 6 we dis-cuss the properties of the well-moded programs containing grouping atoms. The paper finishes with Related Work in Section 7 and Conclusions in Section 8.

2 Preliminaries on Pure Prolog Programs

In what follows we study definite logic programs executed by means of LD-resolution, which consists of the SLD-resolution combined with the leftmost selection rule. The reader is assumed to be familiar with the terminology and the basic results of the se-mantics of logic programs [2,3,17]. Here we adopt the notation of [3] in the fact that we use boldface characters to denote sequences of objects; therefore t denotes a sequence of terms while B is a sequence of atoms, i.e. a query (following [3], queries are simply conjunctions of atoms, possibly empty). We denote atoms by A, B, H, . . . , queries by A, B, C, . . . , clauses by c, d, . . . , and programs by P . For any atom A, we denote by Pred (A) the predicate symbol of A. For example, if A = p(a, X), then Pred (A) = p. The empty query is denoted by_{and the set of clauses defining a predicate is called a} procedure.

For any syntactic object (e.g., atom, clause, query) o, we denote by Var (o) the set of variables occurring in o. Given a substitution σ = {x1/t1, ..., xn/tn} we say

(3)

that {x1, . . . , xn} is its domain (denoted by Dom(σ)) and that Var ({t1, ..., tn}) is its

range(denoted by Ran(σ)). Further, we denote by Var (σ) = Dom(σ) ∪ Ran(σ). If, t1, ..., tnis a permutation of x1, ..., xnthen we say that σ is a renaming. The

composi-tionof substitutions is denoted by juxtaposition (θσ(X) = σ(θ(X))). We say that an syntactic object (e.g., an atom) o is an instance of o0 iff for some σ, o = o0σ, further o is called a variant of o0, written o ≈ o0 iff o and o0 are instances of each other. A substitution θ is a unifier of objects o and o0iff oθ = o0θ. We denote by mgu(o, o0) any most general unifier(mgu, in short) of o and o0.

(LD) Computations are sequences of LD derivation steps. The non-empty query q : B, C and the clause c : H ← B (renamed apart wrt q) yield the resolvent (B, C)θ, provided that θ = mgu(B, H). A derivation step is denoted by B, C=⇒θ c(B, C)θ. c

is called its input clause. A derivation is obtained by iterating derivation steps. A maxi-mal sequence δ := B0 θ1 =⇒c1 B1 θ2 =⇒c2 · · · Bn θn+1 =⇒cn+1 Bn+1· · · of derivation steps

is called an LD derivation of P ∪ {B0} provided that for every step the standardisation

apart condition holds, i.e., the input clause employed at each step is variable disjoint from the initial query B0and from the substitutions and the input clauses used at

ear-lier steps. If the program P is clear from the context and the clauses c1, . . . , cn+1, . . .

are irrelevant, then we drop the reference to them. If δ is maximal and ends with the empty query (Bn = ) then the restriction of θ to the variables of B is called its

computed answer substitution(c.a.s., for short). The length of a (partial) derivation δ, denoted by len(δ), is the number of derivation steps in δ.

A multiset is a collection of elements that are not necessarily distinct [19]. The number of occurrences of an element x in a multiset M is its multiplicity in the multiset, and is denoted by mult(x, M ). When describing multisets we use the notation that is similar to that of the sets, but instead of { and } we use [[ and ]] respectively. For example, M = [[ 1, 1, 2 ]] is a multiset where mult(1, M ) = 2 and mult(2, M ) = 1.

3 Well-Moded Logic Programs

Informally speaking, a mode indicates how the arguments of a relation should be used, i.e. which are the input and which are the output positions of each atom, and allow one to derive properties such as absence of run-time errors for Prolog built-ins, absence of floundering for programs with negation [4].

Definition 1 (Mode). Consider an n-ary predicate symbol p. By a mode for p we mean a functionmpfrom{1, . . . , n} to {In, Out }.

If mp(i) = In (resp. Out), we say that i is an input (resp. output) position of p

(with respect to mp). We assume that each predicate symbol has a unique mode

associ-ated to it; multiple modes may be obtained by simply renaming the predicates. We use the notation (X1, . . . , Xn) to indicate the mode m in which m(i) = Xi. For instance,

(In, Out ) indicates the mode in which the first (resp. second) position is an input (resp. output ) position. To benefit from the advantage of modes, programs are required to be well-moded[4], which means that they have to respect some correctness conditions re-lating the input arguments to the output arguments. We denote by In(A) (resp. Out (A))

(4)

the sequence of terms filling in the input (resp. output) positions of A, and by Var In(A) (resp. Var Out (A)) the set of variables occupying the input (resp. output) positions of A.

Definition 2 (Well-Moded). A clause H ← B1, . . . , Bniswell-moded if for all i ∈

[1, n]

Var In(Bi) ⊆S i−1

j=1Var Out (Bj) ∪ Var In(H), and

Var Out (H) ⊆Sn

j=1Var Out (Bj) ∪ Var In(H).

A queryA is well-moded iff the clause H ← A is well-moded, where H is any (dummy) atom of zero arity. A program iswell-moded if all of its clauses are well-moded.

Note that the first atom of a well-moded query is ground in its input positions and a variant of a well-moded clause is well-moded. The following Lemma, due to [4], shows the “persistence” of the notion of well-modedness.

Lemma 1. An LD-resolvent of a well-moded query and a well-moded clause that is

variable-disjoint with it, is well-moded.

As a consequence of Lemma 1 we have the following well-known properties. For the proof we refer to [7].

1. Let P be a well-moded program and A be a well-moded query. Then for every computed answer σ of A in P , Aσ is ground.

2. Let H ← B1, . . . , Bnbe a clause in a well-moded program P . If A is a well-moded

atom such that γ0= mgu(A, H) and for every i ∈ [1, j], j ∈ [1, n − 1] there exists

a successful LD derivation Biγ0, . . . , γi−1 γi

−→P then Bj+1γ0, . . . , γjis a

well-moded atom.

A concept we need in the sequel is that of terminating program; since we are dealing with well-moded programs, the natural definition we refer to is that of well-terminating programs.

Definition 3. A well-moded program is called well-terminating iff all its LD-derivations starting in a well-moded query are finite.

Termination of (well-moded) logic programs has been exhaustively studied in [5,6,9,11,21,13]. Here we follow the approach of Etalle, Bossi, and Cocco [13].

4 Simple group_set queries

The most intuitive (and modular) way of carrying out aggregate queries is to first col-lect all the answers to a query and then aggregate the values we are interested in. For instance, if we need to know the average age of our employees, we first make a list of the ages of all employees and then we calculate their average. Here we follow the same approach and we introduce the new group_set predicate, which can be seen as a moded counterpart of the well-known findall. Let us see some intuitive examples of how group_set works in simple queries, i.e. the queries consisting of only the group_set construct (so for the moment we do not use it in programs).

(5)

Definition 4. A grouping atom is an atom of the form A = group_set(t, gl, Goal, x), wheret is a term, gl is a list of distinct variables each of which appears in Goal, Goal is an atomic query (but not a grouping atom itself), andx is a free variable.

Definition 4 requires that Goal is atomic. This simplifies the treatment (in particular the treatment of modes) and is not a real restriction, as one can always define new predicates to break down a nested grouping atom into a number of grouping atoms that satisfy Definition 4.

Let us now see some examples, showing how this works. Consider the program P containing the following facts:

p(a,1). p(a,2). p(b,3). p(b,4).

First, if the list gl is empty, then group_set(t, [ ], Goal, x) has the same functionality as the classical built-in findall and x is instantiated to the list of instances of t correspond-ing to the computed answer substitutions of the query Goal:

Q: group_set(Z,[],p(Y,Z),X). A: X=[1,2,3,4].

On the other hand, if gl is not empty, then the computed answer substitutions to the Goal will be grouped in sets of answers with the following property: two computed answer substitutions σ1and σ2belong to the same set if and only if glσ1= glσ2.

Q: group_set(Z,[Y],p(Y,Z),X). A: Y=a

X=[1,2]; Y=b X=[3,4].

4.1 Semantics of simple group_set queries

A subtle difficulty in providing a reasonable semantics for group_set is due to the fact that we have to take into consideration the multiplicity of answers. In a typical situation, group_setwill be used to compute e.g. averages, as in the query group_set(W, [Y],p(Y,W),X),average(X,Z). To this end, X should actually be instantiated to a multiset of terms corresponding to the answers of the query p(Y,W). However, since Prolog does not support multisets nicely, for the sake of simplicity in a practical implementation we are forced to use a list instead. The disadvantage of using a list is that it is order-dependent: by permuting the elements of a list one can obtain a different list. In the (natural) implementation, given the query group_set(. . . , . . . , Goal, x), the c.a.s. will instantiate x to a list of elements, the order of which is dependent on the order with which the computed answer substitutions to the query Goal are computed. This depends in turn on the order of the clauses in the program. Consequently, unless we extend the syntax to include multisets, the semantics of group_set is bound to be

(6)

different than the classical LP semantics (any of them), as it depends on the order of the clauses in the program. This is annoying from the viewpoint of theory, but acceptable when using it in practice. To cope with this problem here we provide two semantics: the first one – more “declarative” – assumes that multisets of terms are part of the universe of discourse and that a multiset operator [[ ]] is available, while the second one does not. So in the following definition we assume that the universe of discourse includes multisets.

Definition 5 (c.a.s. to group_set using Multisets). Let P be a program, and A = group_set(t, gl, Goal, x) be a query. The multiset [[ α1, . . . , αk]] of computed answer

substitutions ofP ∪ A is defined as follows:

1. LetΣ = [[ σ1, . . . , σn]] be the multiset of c.a.s. of P ∪ Goal.

2. LetΣ1, . . . Σk be a partitioning ofΣ such that two answers σiand σj belong to

the same partition iffglσi= glσj,

3. For eachΣi, lettsi be the multiset of terms obtained by instantiatingt with the

substitutionsσiinΣi, i.e.tsi= [[ tσi| σi∈ Σi]], and let gli= glσ where σ is any

substitution fromΣi.

4. Fori ∈ [1, k], αiis the substitution{gl/gli, x/tsi}.

Example 1. Let P be a program containing the following facts: p(a,c,1), p(a,d,1), p(a,e,3), p(b,c,2), p(b,d,2), p(b,e,4). Let A = group_set(Z,[Y], p(Y,W,Z),X). Then P ∪ A yields the following two computed answer substitutions: α1= {Y/a, X/[[ 1, 1, 3 ]]} and α2= {Y/b, X/[[ 2, 2, 4 ]]}.

As we said, since Prolog does not support multisets, in the sequel we use lists instead. The disadvantage of using lists is that they are order-dependent, and that if a multiset contains two or more different elements, then there exists more than one list “repre-senting” it. Here we simply accept this shortcoming and tolerate the fact that, in real Prolog programs, the aggregating variable x will be instantiated to one of the possible lists representing the multiset of answers.

Definition 6 (c.a.s. to group_set using Lists). Let P be a program, and A =

group_set(t, gl, Goal, x) be a query. The multiset [[ α1, . . . , αk]] of computed answer

substitution ofP ∪ A is defined as follows:

1. LetΣ = [[ σ1, . . . , σn]] be the multiset of c.a.s. of P ∪ Goal.

2. LetΣ1, . . . Σk be a partitioning ofΣ such that two answers σiand σj belong to

the same partition iffglσi= glσj,

3. For eachi, let ∆i be an ordering onΣi, i.e. a list of substitutions containing the

same elements ofΣi, counting multiplicities.

4. For each∆i = [σi1, . . . , σin], let tsibe thelist of terms obtained by instantiating

t with the substitutions in ∆i, i.e.tsi= [tσi1, . . . , tσin], and let gli= glσ where σ

is any substitution from∆i.

5. fori ∈ [1, k], αiis the substitution{gl/gli, x/tsi}.

Example 2. Take the same program P as in Example 1 but with different order of the clauses: p(a,c,1), p(a,e,3), p(a,d,1), p(b,e,4), p(b,c,2), p(b,d,2).

(7)

Again, let A = group_set(Z,[Y],p(Y,W,Z),X). Then P ∪ A yields the fol-lowing two computed answer substitutions using lists: α1 = {Y/a, X/[1,3,1]} and

α2= {Y/b, X/[4,2,2]}.

In the future, we will refer to this second definition. Notice that, to bring this defini-tion into practice, i.e., to really compute the answer to a query group_set(t, gl, Goal, x), we have to require that P ∪ Goal terminates.

5 Using group_set in queries and programs

In this section we discuss the use of group_set in programs. Here we are going to take advantage of modes, which will allow us to lift the (rather restrictive) conditions on variable sharing of previous approaches ([19,14]), and to prove groundness and termi-nation properties. So, before we discuss the semantics of the operator we first need to introduce a mode for it.

5.1 Modes

The mode of the query group_set(t, gl, Goal, x) depends on the mode of the Goal, so it is not fixed a priori. In addition, we introduce the concept of local variables.

Definition 7. Let A = group_set(t, gl, Goal, x). We define the following sets of input, output and local variables for A:

– Var In(A) = Var In(Goal),

– Var Out (A) = Var (gl) \ Var In(A) ∪ x,

– Var Local (A) = Var (A) \ (Var In(A) ∪ Var Out (A)),

For example, let A = group_set(q(W,Y,Z),[Y],p(W,Y,Z),X) be an ag-gregate atom, and assume that the original mode of p is (In, Out , Out ).

Then, Var In(A) = {W}, Var Out (A) = {X, Y}, and Var Local (A) = {Z}.

Now, we can extend the definition of well-moded program in the obvious way to take into consideration group_set atoms; the only extra care we have to take is that local variables should not appear elsewhere in the clause (or query).

Definition 8 (Well-Moded-Extended). We say that the clause H ← B1, . . . , Bn is

well-moded if for all i ∈ [1, n]

Var In(Bi) ⊆ i−1

[

j=1

Var Out (Bj) ∪ Var In(H)

and

Var Out (H) ⊆

n

[

j=1

Var Out (Bj) ∪ Var In(H).

and∀Bi∈ {B1, . . . , Bn} Var Local (Bi) ∩   [ j∈{1,...,i−1,i+1,...,n} Var (Bj) ∪ Var (H)  = ∅.

A queryA is well-moded iff the clause H ← A is well-moded, where H is any (dummy) atom of zero arity. A program iswell-moded if all of its clauses are well-moded.

(8)

It is worth mentioning that in the approach of [19] and [14] all variables in Goal not occurring in the grouping list (gl) must not occur elsewhere in the clause or query containing this grouping atom (using our notation: they are all local). Here we relax this condition: input variables are not local and can occur elsewhere in the clause contain-ing this groupcontain-ing atom. This is possible thanks to the mode information because input variables are ground at the moment the grouping is performed. For instance, the query q(Y),group_set(Z,[],p(Y,Z),X)is allowed in our system (provided that the mode of p and q are (In, Out ) and Out , respectively) while it is not allowed using the definitions in [14] or [19], because of the shared variable Y . This significantly im-proves the practical applicability of the grouping construct (for instance, the program in the forthcoming Example 3 would not be allowed by the previous definition).

5.2 LD Derivations with Grouping

We extend the definition of LD-resolution to queries containing group_set atoms.

Definition 9 (LD-resolvent with grouping). Let P be a program. Let ρ : B, C be a query. We distinguish two cases:

1. if B is a group_set atom and α is a c.a.s. for B in P then we say that B, C andP yield the resolvent Cα. The corresponding derivation step is denoted by B, C=⇒α P Cα.

2. ifB is a regular atom and c : H ← B is a clause in P renamed apart wrt ρ such thatH and B unify with mgu θ, then we say that ρ and c yield resolvent (B, C)θ. The corresponding derivation step is denoted byB, C=⇒θ c(B, C)θ.

As usual, a maximal sequence of derivation steps starting from queryB is called an LD derivation of P ∪ {B} provided that for every step the standardisation apart condition

holds.

Example 3. The Financial Administration (fa) of the University of Twente makes monthly summaries of the expenses made within several projects. Each expense is rep-resented by a predicate expense/4, moded (In, Out , Out , Out ), where the first ar-gument is the research group making the expense, the second arar-gument represents the project to be charged, the third argument is the amount used, and the last one is a time-stamp. A research group within a department is denoted by research_group( Dept,RGroup)moded (In, Out ).

expense(dies,ishare,2200,’25-01-2007’). expense(dies,ishare,2200,’25-02-2007’). expense(caes,ishare,1000,’10-03-2007’). expense(caes,ishare,2200,’25-03-2007’). expense(dies,istrice,1200,’25-01-2007’). expense(caes,istrice,1400,’25-02-2007’). research_group(ewi,dies). research_group(ewi,caes).

(9)

Now imagine that one is interested in the list of expenses made by each research group in the ewi department grouped by the project and formatted as

expense(RGroup,Project,Amount). Then one can use the following query:

A = research_group(ewi,W),

group_set(expense(W,Y,Z),[Y],expense(W,Y,Z,V),X)

We have Var In(A) = {W}, Var Out (A) = {X, Y}, and Var Local (A) = {V, Z}. The computed ground answers to this query are:

(1) research_group(ewi,dies),group_set(expense(dies,ishare,Z), [ishare],expense(dies,ishare,Z,V), [expense(dies,ishare,2200),expense(dies,ishare,2200)]) (2) research_group(ewi,dies),group_set(expense(dies,istrice,Z), [istrice],expense(dies,istrice,Z,V), [expense(dies,istrice,1200)]) (3) research_group(ewi,caes),group_set(expense(caes,ishare,Z), [ishare],expense(caes,ishare,Z,V), [expense(caes,ishare,1000),expense(caes,ishare,2200)]) (4) research_group(ewi,caes),group_set(expense(caes,istrice,Z), [istrice],expense(caes,istrice,Z,V), [expense(caes,istrice,1400)])

In order to compute the sum and average of the expenses made by a research group grouped by the project, one may extend the program above with the following rules:

sum_avg(RGroup,Proj,Sum,Avg,M)

:-group_set(Z,[Proj],expense(RGroup,Proj,Z,Y),X), sum(X,Sum,Len), Avg is Sum/Len.

sum([],0,0).

sum([H|T],Sum,Len) :- sum(T,Sum1,Len1), Sum is Sum1+H, Len is Len1+1.

6 Properties

There are two main properties we can prove for programs containing grouping atoms: groundness of computed answer substitutions and – under additional constraints – ter-mination.

Groundness Well-moded group_set atoms enjoy the same features as regular well-moded atoms. The following lemma is a natural consequence of Lemma 1.

Lemma 2. Let P be a well-moded program and A = group_set(t, gl, Goal, x) be a grouping atom in which gl is a list of variables. Take any groundσ such that Dom(σ) = Var In(A). Then each c.a.s. θ of P ∪ Aσ is ground on A’s output variables, i.e. Dom(θ) = Var Out (A) and Ran(θ) = ∅.

Proof. By noticing that Var In(A) = Var In(Goal) and that each variable in the group-ing list gl appears in Goal, the proof is a straightforward consequence of Lemma 1.

(10)

Termination Termination is particularly important in the context of grouping queries, because if Goal does not terminate (i.e., if some LD derivation starting in Goal is in-finite) then the grouping atom group_set(t, gl, Goal, x) does not return any answer (it loops).

If the grouping atom is only in the top-level query and there are no grouping atoms in the bodies of the program clauses then, to ensure termination, it is sufficient to require that P be well-terminating in the way described in [13]: i.e. that for every well-moded non grouping atom A, all LD derivations of P ∪ A are finite. If this con-dition is satisfied then all LD derivations of P ∪ Goal are finite and then the query group_set(t, gl, Goal, x) terminates (provided they are well-moded).

On the other hand, if we allow grouping atoms in the body of the clauses, then we have to make sure that the program does not include recursion through a grouping atom. The following example shows what can go wrong here.

Example 4. Consider the following program:

(1) p(X,Z) :- group_set(Y,[X],q(X,Y),Z). (2) q(X,Z) :- group_set(Y,[X],p(X,Y),Z). (3) q(a,1). (4) q(a,2). (5) q(b,3). (6) q(b,4).

Here p and q are defined in terms of each other through the grouping operation. Therefore p(X,Z) cannot terminate until q(X,Y) terminates (clause 1). Computa-tion of q(X,Y) in turn depends on the terminaComputa-tion of the group set operaComputa-tion on p(X,Y)(clause 2). Intuitively, one would expect that the model of this program con-tains q(a,1), q(a,2), q(b,3), and q(b,4). However, if we apply the extended LD resolvent (Definition 9) to compute the c.a.s. of p(X,Y) we see that the computa-tion loops.

In order to prevent this kind of problems, to guarantee termination we require pro-grams to be aggregate stratified [14]. Aggregate stratification is similar to the concept of stratified negation [2,23], and puts syntactical restrictions on the aggregate programs so that recursion through group_set does not occur. For the notation, we follow Apt et al. in [2]. Before we proceed to the definition of stratified programs we need to formalise the following notions. Given a program P and a clause H ← . . . , B, . . . . ∈ P :

– if B is a grouping atom group_set(t, gl, Goal, x) then we say that Pred (H) refers toPred (Goal);

– otherwise, we say that Pred (H) refers to Pred (B).

We say that relation symbol p depends on relation symbol q in P , denoted p w q, iff (p, q) is in the reflexive and transitive closure of the relation refers to. Given a non-grouping atom B, the definition of B is the subset of P consisting of all clauses with a formula on the left side whose relation symbol is Pred (B). Finally, p ' q ≡ p v q ∧ p w q means that p and q are mutually recursive, and p A q ≡ p w q ∧ p 6' q means that p calls q as a subprogram. Notice that_{A is a well-founded ordering.}

(11)

Definition 10. A program P is called stratified if for every clause H ← B1, . . . , Bm,

in it, and everyBjin its body we have that

– if Bj is a grouping atomBj = group_set(. . . , . . . , Goal, . . .) then Pred (Goal) 6'

Pred (H).

Given the finiteness of programs it is easy to show that a program P is stratified iff there exists a partition of it P = P1∪ · · · ∪ Pnsuch that for every i ∈ [1, . . . , n], and every

clause cl = H ← B1. . . , Bm∈ Pi, and every Bjin its body, the following conditions

hold:

1. if Bj = group_set(. . . , . . . , Goal, . . .) then the definition of Pred (Goal) is

con-tained withinS

j<iPj,

2. otherwise the definition of Pred (B) is contained withinS

j≤iPj.

Stratification alone does not guarantee termination. The following (obvious) exam-ple demonstrates this.

Example 5. Take the following program:

q(X,Y) :- r(X,Y). r(X,Y) :- q(X,Y).

p(Y,X) :- group_set(Z,[Y],q(Y,Z),X).

Notice that q ' r. This program is (aggregate) stratified, but the query p(Y,X) will not terminate.

Notice now that the program in Example 4 does not terminate despite the fact that all LD-derivations are finite; this shows that in presence of grouping atoms, we have to modify slightly the classical definition of termination. The following definition relies on the fact that the programs we are referring to are stratified.

Definition 11 (Termination of Aggregate Stratified Programs). Let P be an aggre-gate stratified program. We say thatP is well-terminating if for every well-moded atom A the following conditions hold:

1. All LD derivations ofP ∪ A are finite,

2. For each LD derivationδ of P ∪ A, for each grouping atom group_set(t, gl, Goal, x) selected in δ, P ∪ Goal terminates.

The classical definition of termination considers only point (1). Here however, we have grouping atoms which actually trigger a side goal which is not taken into account by (1) alone. This is the reason why we need (2) as well. Notice that the notion is well-defined thanks to the fact that programs are stratified.

To guarantee termination, we can combine the notion of stratified program above with the notion of well-acceptable program introduced by Etalle, Bossi, and Cocco in [13] (other approaches are also possible). We now show how.

(12)

1. it is alevel mapping for P , namely it is a function | | : BP → N, from ground

atoms to natural numbers;

2. ifp(t) and p(s) coincide in the input positions then |p(t)| = |p(s)|.

ForA ∈ BP,|A| is called the level of A.

Condition (2) above states that the level of an atom is independent from the terms filling in its output positions. Finally, we can report the key concept we use in order to prove well-termination.

Definition 13. (Weakly- and Well-Acceptable [13]) Let P be a program, | | be a level mapping andM a model of P .

– A clause of P is called weakly acceptable (wrt | | and M ) iff for every ground instance of it,H ← A, B, C,

if M |= A and Pred (H) ' Pred (B) then |H| > |B|.

M is a model of P and P is weakly acceptable wrt them. Notice that a fact is always both weakly acceptable and well-acceptable; furthermore if MPis the least Herbrand model of P , and P is well-acceptable wrt | | and some model

I then, by the minimality of MP, P is well-acceptable wrt | | and MP as well.

Here and in the sequel let us adopt the following notation: given a program and a clause H ← . . . , B, . . . of it, we say that B is relevant iff Pred (H) ' Pred (B). Here the norm has to be checked only for the relevant atoms, because only the relevant atoms might provide recursion. Notice then that, because we additionally require that programs are stratified, grouping atoms in a clause are not relevant (called as subpro-grams).

We can now state the main result of this section.

Theorem 1. Let P be a well-moded aggregate stratified program.

– If P is well-acceptable then P is well-terminating.

Proof. (Sketch). Given a well-moded atom A, we have to prove that (a) all LD deriva-tions starting in A are finite and that (b) for each LD derivation δ of P ∪ A, for each grouping atom group_set(t, gl, Goal, x) selected in δ, P ∪ Goal terminates.

To prove (a) one can proceed exactly as done in [13], where the authors use the same notions of well-acceptable program: the fact that here we use a modified version of LD-derivation has no influence on this point: since grouping atoms are resolved by removing them, they cannot add anything to the length of an LD derivation.

On the other hand, to prove (b) one proceeds by induction on the strata of P . Notice that at the moment that the grouping atom is selected, Goal is well-moded (i.e., ground in its input position). Now, for the base case if Goal is defined in P1, then, by (a) we

(13)

(where clause bodies cannot contain grouping atoms) no grouping atom is ever selected in an LD derivation starting in Goal. So P ∪ Goal terminates.

The inductive case is similar: if Goal is defined in Pi+1, then, by (a) we have that

all LD-derivations starting in Goal are finite, and since we are in stratum Pi+1 if a

grouping atom group_set(t0, gl0, Goal0, x0) is selected in a LD derivation starting in Goal, we have that Goal0 must be defined in P1∪ · · · ∪ Pi, so that – by inductive

hypothesis – we know that P ∪ Goal0terminates. Hence the thesis.

7 Related Work

The first formal definition of aggregate functions was given by Klug in [15]. Klug also extends the relational algebra and relational calculus to support aggregate functions in a natural manner. In his approach, Klug avoids using multisets, and instead of defining aggregate functions directly on multisets he introduces a family of functions for each aggregate operation with the domain being the set of all relations. In his work, Klug eliminates the so called unsafe expressions that generate infinite number of tuples. This motivates the use of well-termination in our approach.

Özsoyoˇglu et al in [20] further extends Klug’s algebra and calculus with set-valued attributes. Using set primitives as an alternative to using aggregate subgoals is also the approach used by Abiteboul and Beeri in [1]. However, as already observed by Kemp and Stuckey in [14], sets are mostly used where aggregate subgoals could be used instead, while the latter seem to be easier to implement in the current logic programming systems. In the work of Özsoyoˇglu et al unsafe expressions are also not allowed.

Grouping and aggregates in logic-based languages are also defined by Klug in [15] and by G. Özsoyoˇglu et al. in [20]. These query languages operate on the relational databases without derived relations. It means that the recursion through aggregates is not an issue in their work.

Mumick et al provide a semantics in which predicates can be specified to be sets or multisets of tuples [19]. This is different from our approach in that we do not require the complete support for multisets, as most usable scenarios involving multisets can be simply expressed using lists. Mumick et al also consider the use of the group_by op-erator of SQL in the class of recursive programs. They recognise that many difficulties due to recursion through group_by construct can be avoided by using stratification.

Kemp and Stuckey present approach which is similar to that of Mumick et al and they extend well-founded and stable models to programs with aggregates [14]. In both [19] and [14] cardinalities of multisets are not restricted. Additionally, Kemp and Stuckey explicitly allows some aggregate operations to be defined on infinite multisets. Here, we effectively eliminate infinite multisets by requiring programs to be well-terminating. Both Mumick et al in [19] and Kemp and Stuckey in [14] define group stratification which is a straightforward extension of local stratification to include programs with aggregates.

Worth mentioning is that in the LDL language [8], which is similar to [20] in that they also use sets as primitive objects, one is allowed to construct set-terms by “group-ing” all instantiations of a term in the body of a rule, but they do not use aggregate subgoals explicitly.

(14)

Moded Logic Programming is well-researched area [18,4,7,22]. However, to our best knowledge the modes has been never applied to aggregates. We also extend the standard definition of a mode to include the notion of local variables. By incorporating the mode system we are able to relax some of the restrictions on the use of aggregates in logic clauses.

8 Conclusions

In this paper we present a new higher-order predicate group_set that can be used in a logic program to realise aggregate operations. In contrast to the approaches like in [19] and [14], our group_set does not require that aggregation must be performed each time grouping takes place. Instead, the output of group_set is a multiset (expressed as a Prolog list) that can be used in many different aggregate operations.

We show that by using modes we can relax some of the restrictions imposed on the use of grouping in logic programs. We do so by extending the definition of the mode by allowing some variables in a grouping atom to be local.

Finally, we show that for the class of well-terminating aggregate stratified programs the basic properties of well-modedness and well-termination also hold for programs with grouping.

Future Work At the University of Twente we develop a new Trust Management lan-guage TuLiP. TuLiP is a function-free first-order lanlan-guage that uses modes to support distributed credential discovery. In Trust Management, the need of having support for aggregate operations is widely accepted. This would allow one to bridge two related yet different worlds of certificate based and reputation based trust management. At the mo-ment TuLiP does not support aggregate operations. We are planning to incorporate the group_set operator introduced in this paper in TuLiP and investigate its applicability in the Distributed Trust Management.

Acknowledgements This work was carried out within the Freeband I-Share project.

References

1. S. Abiteboul and C. Beeri. On the power of languages for the manipulation of complex objects. Rapport De Recherche 846, INRIA, May 1988.

2. K. R. Apt. Introduction to Logic Programming. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, volume B: Formal Models and Semantics, pages 495–574. Elsevier, Amsterdam and The MIT Press, Cambridge, 1990.

3. K. R. Apt. From Logic Programming to Prolog. Prentice Hall, 1997.

4. K. R. Apt and E. Marchiori. Reasoning about Prolog programs: from Modes through Types to Assertions. Formal Aspects of Computing, 6(6A):743–765, 1994.

5. K. R. Apt and D. Pedreschi. Reasoning about termination of pure Prolog programs. Infor-mation and Computation, 106(1):109–157, 1993.

6. K. R. Apt and D. Pedreschi. Modular termination proofs for logic and pure Prolog pro-grams. In G. Levi, editor, Advances in Logic Programming Theory, pages 183–229. Oxford University Press, 1994.

(15)

7. K. R. Apt and A. Pellegrini. On the occur-check free Prolog programs. ACM Toplas, 16(3):687–726, 1994.

8. C. Beeri, S. A. Naqvi, R. Ramakrishnan, O. Shmueli, and S. Tsur. Sets and negation in a logic database language (ldl1). In PODS, pages 21–37. ACM, 1987.

9. M. Bezem. Strong termination of logic programs. Journal of Logic Programming, 15(1&2):79–97, 1993.

10. M. Blaze, J. Feigenbaum, and J. Lacy. Decentralized Trust Management. In Proc. 17th IEEE Symposium on Security and Privacy, pages 164–173. IEEE Computer Society Press, May 1996.

11. A. Bossi, N. Cocco, and M. Fabris. Norms on Terms and their use in Proving Universal Termination of a Logic Program. Theoretical Computer Science, 124:297–328, 1994. 12. D. Clarke, J.E. Elien, C. Ellison, M. Fredette, A. Morcos, and R. L. Rivest. Certificate Chain

Discovery in SPKI/SDSI. Journal of Computer Security, 9(4):285–322, 2001.

13. S. Etalle, A. Bossi, and N. Cocco. Termination of well-moded programs. J. Log. Program., 38(2):243–257, 1999.

14. D. B. Kemp and P. J. Stuckey. Semantics of logic programs with aggregates. In ISLP, pages 387–401. MIT Press, 1991.

15. A. Klug. Equivalence of relational algebra and relational calculus query languages having aggregate functions. J. ACM, 29(3):699–717, 1982.

16. N. Li, J. Mitchell, and W. Winsborough. Design of a Role-based Trust-management Frame-work. In Proc. IEEE Symposium on Security and Privacy, pages 114–130. IEEE Computer Society Press, 2002.

17. J. W. Lloyd. Foundations of Logic Programming. Springer, 2 edition, 1993.

18. C. S. Mellish. The Automatic Generation of Mode Declarations for Prolog Programs. DAI Research Paper 163, Department of Artificial Intelligence, Univ. of Edinburgh, August 1981. 19. I. S. Mumick, H. Pirahesh, and R. Ramakrishnan. The Magic of Duplicates and Aggregates. In Proc. 16th International Conference on Very Large Databases, pages 264–277. Morgan Kaufmann Publishers Inc., 1990.

20. G. Özsoyoˇglu, Z. M. Özsoyoˇglu, and V. Matos. Extending relational algebra and relational calculus with set-valued attributes and aggregate functions. ACM Trans. Database Syst., 12(4):566–592, 1987.

21. L. Plümer. Termination Proofs for Logic Programs, volume 446 of LNCS. Springer, 1990. 22. Z. Somogyi, F. Henderson, and T. Conway. Mercury: an efficient purely declarative logic

programming language. In Australian Computer Science Conference, 1995. available at http://www.cs.mu.oz.au/mercury/papers.html.

23. J. D. Ullman. Principles of Database and Knowledge-Base Systems, volume 1. Computer Science Press, 1988.