Bottum-up tree acceptors

(1)

Bottum-up tree acceptors

Citation for published version (APA):

Hemerik, C., & Katoen, J. P. (1988). Bottum-up tree acceptors. (Computing science notes; Vol. 8816). Technische Universiteit Eindhoven.

Document status and date: Published: 01/01/1988

Document Version:

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.

• The final author version and the galley proof are versions of the publication after peer review.

• The final published version features the final layout of the paper including the volume, issue and page numbers.

Link to publication

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:

www.tue.nl/taverne Take down policy

If you believe that this document breaches copyright please contact us at: openaccess@tue.nl

providing details and we will investigate your claim.

(2)

Bottom-up tree acceptors

by

C.Hemerik and J.P. Katoen

88/16

(3)

COMPUTING SCIENCE NOTES

This is a series of notes of the Computing

Science Section of the Department of

Mathematics and Computing Science

Eindhoven University of Technology.

Since many of these notes are preliminary

versions or may be published elsewhere, they

have a limited distribution only and are not

for review.

Copies of these notes are available from the

author or the editor.

Eindhoven University of Technology

Department of Mathematics and Computing Science

P.O. Box 513

5600 MB EINDHOVEN

The Netherlands

All righ ts reserved

Editors:

(4)

Bottom-up tree acceptors

C. Hemerik J. P. Katoen

Dept. of Mathematics and Computing Science Eindhoven University of Technology P.O. Box 513, 5600 MB Eindhoven, Netherlands

October 28, 1988

Abstract

This paper deals with the formal derivation of an efficient tabulation algorithm for table-driven bottom-up tree acceptors. Bottom-up tree acceptors are based on a notion of match sets. First we derive a naive acceptance algorithm using dynamic computation of match sets. Tabulation of match sets leads to an efficient acceptance algorithm, but tables may be so large that they can not be generated due to lack of space. Introduction of a convenient equivalence relation on match sets reduces this effect and improves the tabulation algorithm.

1 Introduction

Nowadays, many parts of a compiler can be generated antomatically. For instance, antomatic generation of lexical and syntactic analyzers using notations based on regular expressions and context-free grammars is commonly used (see e.g. [Aho]).

However, much research is still going on in the field of universal code generator-generators, which take a description of a machine as input and deliver a (good) code generator for that machine. Code generation forms an important subject in developing a compiler. Requirements tradition-ally imposed on a code generator are severe : the generated code must be correct and must utilize the resources of the machine (such as registers) efficiently. A particular and relevant issue in code generation is instruction selection. This forms the subject of the remainder of this section. Nature of the instruction set and addressing modes of the target machine determine the difficulty of instruction selection. As an example, we illustrate the instruction selection of an expression for a register machine with a very simpie instruction set. First the addressing modes are presented.

(5)

I INTRODUCTION Example 1.1 addressing mode

o

immediate register indexed indirect format

#c

R; C(Ri} *R; mean'tng c Ri M(c

+

Ri} M(R;} 2

Here, M(a) denotes the contents of address a ; c is a constant and Ri is a register. Suppose our target machine supports the follo~ing instructions.

Example 1.2 instruction {1} MOV #c, Ri (2) MOV *Rj, Ri (3) MOV c(Rj}, Ri (4) ADD Ri, Rj o definition Ri :- c R; := M(Rj} R; := M(c

+

Rj) Ri := R;

+

Rj

Now consider the expression R1 := C1

+

M(c,

+

R,). Using the instructions given above we may derive an instruction sequence for thls expression as follows. At each step the selected subexpression to code is underlined.

Example 1.3

o

expression R1 := C1

+

M~

+

R 2) R1 := C1

+

M(R3

+

R2) R1 := C1

+

M(R2) R1 := C1

+

R2 R1 := R1

+

R2 instruction MOV#C2, R3 ADD Rz, R3 MOV*Rz, Rz MOV#c}, R1 ADD R1, R2

Observe that the derivation above looks like the parsing of a string. By replacing the definition of an instruction, which is of the form Ri := ... , by a production rule of the form R; -> ... ,

definitions (1)-(4) of example 1.2 may be considered as a code generation grammar.

The above suggests to use traditional parsing techniques for code generation. For instance, Grallam & Glanville use LR-parsing for instruction selection (see [Grall am]). Main problem with this approach is the solution of the large number of parsing conflicts caused by the fact that code generation grammars are inherently highly ambiguous. For example, an alternative instruction sequence may be derived for the expression R1 := C1

+

M( C2

+

R 2) as follows:

(6)

1 INTRODUCTION Example 1.4

o

expression Rl := Cl

+

M(C2

+

R 2) Rl := Cl

+

R2 Rl := Rl

+

R2 instruction MOV c2(R2)' R2 MOV#c}, Rl ADD R1 , R2 3

A way to overcome this problem could be to use a more general parsing method like Earley parsing, as suggested in [Christopher a], but the resulting space and time complexity is unac-ceptable for practical use in code generators.

Neither ofthese methods just mentioned takes into account a special property of code generation grammars, viz. that every operator symbol has a fixed rank. Using this fact leads to the idea of considering the tree representation of an expression, rather than its string representation. To this end, the code generation (string) grammar becomes a so-called tree gmmmar. For the instructions of example 1.2 the production rules, represented by trees, become:

Example 1.4

instruction tree representation MOV#c, Ri

MOV*Rj,

Ri

->

MOVc(Rj),

Ri

R. ->

ADDRi, Rj

o

Several code generation algorithms based on tree grammars have been described [Aho, Christo-pher b, Turner], but a theoretical framework is painfully missing. This is the more remarkable as a well-developed theory of tree grammars and tree automata already exists for some twenty years [Brainerd, Doner, Rounds, Hoffmann]. A systematic treatment of this theory, aimed at code generation applications, is given in [Van Dinther] ; a survey paper is in preparation [Hemerik]. In this paper we consider a particular class of tree acceptors, called deterministic bottom-up tree acceptors, whicll have a time complexity proportional to the size of the tree to be analyzed. They can easily be extended to bottom-up tree parsers. Our main purpose is to present algorithms

(7)

2 TREE GRAMMARS 4

for the efficient generation of compressed parse tables, and to show how these rather complex programs can be derived systematically.

The organization of this paper is as follows: in sections 2 and 3 we present a simplified treatment of the theory of tree grammars and deterministic bottom-up tree acceptors. Section 4 shows how the transition functions of the acceptor can be tabulated, thus leading to a linear time ac-ceptance algorithm. In practical applications the size of the resulting tables may be prohibitive however. Therefore in section 5, using ideas of [Chase], an improved algorithm is described which generates compressed transition tables. Finally, section 6 contains some concluding remarks.

2 Tree grammars

In this section we define the basic concepts of the theory of tree grammars. Readers familiar with context-free (string-) grammars will notice that tree grammars are a generalization of context-free (string-) grammars.

Definition 2.1 { ranked alphabet 1 }

A ranked alphabet is a pair (V,r) such that V is a finite set and rEV --+ N

o

Elements of V are called symbols and r( a) is called the rank of symbol a. In the following the set Vn denotes the set of symbols with rank n, that is, Vn

=

{v E V

I

r(v)

=

n}.

Definition 2.2 { 7ree(V, r), trees over a ranked alphabet}

The set Tree (V, r) of trees over a ranked alphabet (V, r) is the smallest set X such that : • Vo ~ X

• "In: 1

S

n :Va E Vn: Vtl, ... ,tn EX: a(tl, ... ,tn ) E X

o

Definition 2.3 { tree grammar}

A tree grammar G is a 5-tuple (N, V, r, P, S) such that:

o

• (N U V,r) is a ranked alphabet such that VA EN: r(A) = 0

• Nnv=0

• P is a finite subset of N x Tree( N U V, r)

.SEN

(8)

2 TREE GRAMMARS 5

Elements of N, V, P are called nonterminals, terminals, and production rules, respectively. S is called the start symbol of G. Notational remark: upper-case letters are used to denote nonterminals, lower-case letters stand for terminals. An element (A, t) E P is usually written as A --> t ; A is sometimes called the left-hand side, t the right-hand side of the production rule.

Definition 2.4 { ~, derivation-step (informal) }

Let (N, V, r, P, S) be a tree grammar. Vt" t2 E Tree(N U V, r) :

o

tl ~ t2 if 3( A --> a) E P, such that t2 can be obtained from

tl by substituting a for one occurrence of A in tl

~ is the reflexive and transitive closure of~. We say that t is derivable from A if A ~ t. The set of trees, contalning only terminals, derivable from the start symbol constitute the language that is generated by the corresponding tree grammar.

Definition 2.5

{e,

language generated by a tree grammar} Let G = (N, V, r,P, S) be a tree grammar .

• the function

e

E N ---" P(Tree(V,r» is defined by: VA EN: erA)

=

{t E Tree(V,r)

I

A ~ t}

• the language generated by G is e( S)

o

The tree grammar defined in the following example shall be used as a running example through-out this paper.

Example 2.6

Let G = (N,

V,

r, P, A) be a tree grammar, where N = {A,B},. V = {a,b,c,d},. r(a) = 2 ,. r(b)

=

1 ,. r(c) = 0,. r(d)

=

0,. P={(l) A-->a(b(c),B), (2) A-->a(B,d), (3) A --> c, (4) B --> b(B), (5) B --> A, (6) B-->d}

An alternative presentation of the elements of P is given in figure 1. Some examples of deriva-tions are :

(9)

3 TREE ACCEPTORS (l) A -+ a (2) A -+ a (3)

/~

b B B d

I

c (4) B -+ _b ₍₅₎ _B -+ _A ₍₆₎

I

B

Figure 1: Production rules represented as trees

A

Jg.

a(b(c),B)

~

a(b(c),d).

A

~

a(B,d)

~

a(b(B),d)

~

a(b(A),d)

~

a(b(c),d). Some elements of erG) are : c, a(b(c),d), a(a(b(c),b(d»,d).

o

3 Tree acceptors

6

A -+ c

B -+ _d

A tree acceptor is a tree automaton which, given a tree grammar G = (N, V, r, P, S) and a tree t E Tree(V,r), establishes whether t E e(G). In this section we consider a particular kind of tree acceptors, viz. deterministic bottom-up tree acceptors, although we shall not stress the automata-theoretic concepts. The basic idea underlying this kind of acceptor is to extract from the grammar G a set PS of patterns, i.e. subtrees of right-hand sides of production rules, and to compute for a tree t the match set MS,(t) of patterns from which it may be derived. The tree t is accepted if and only if S E M S,(t). As the match set of a computed tree can be simply computed from the match sets of its direct subtrees, an acceptor can be obtained which operates in time proportional to the size of the tree.

The following definitions are all relative to a given tree grammar G = (N, V, r, P, S). We assume that G has no useless terminals and nonterminals, i.e. every (non)ternlinal occurs in some tree derivable from S and for all A EN: erA)

oF

0.

Definition 3.1 { Sltb, sltbtree relation}

Vn : 1 :::: n : Va E Vn : Vt" ... , tn E Tree(N U V, r) : Vi: 1 :::: i :::: n : ti 8ub a(t" ... ,tn)

o

(10)

3 TREE ACCEPTORS

m'

is the reflexive and transitive closure of sub. Definition 3.2 { PS, pattern set}

PS = {t E Tree(N U V,r) 13A,t': A EN" t' E Tree(N U V,r): (A --> t') E P" t sub' t'}

o

7

Notice that N ~ PS holds, since every nonterminal occurs in some tree derivable from S. The closure of a pattern s is the set of patterns containing s and the nonterminals from which s is derivable. Similarly, the closure of a set of patterns is defined as follows.

Definition 3.3 { closure }

• the function closure E PCPS) ---> PCPS) is defined by:

• 'Vs E P(PS): closure(s) = s U {A E N 13<> E s: A

=*

<>}

o

Apparently, for all s E PCPS) : closure(s) ~ PS. There are various ways to handle the acceptance problem. One possible way, commonly known as the up method (or bottom-up pattern matching), is to derive the start symbol S starting with a given tree t. The bottom-up method relies on the notion of match sets, sets of subpatterns that match at a particular tree node. These sets are defined recursively as :

Definition 3.4 { MS" match set}

• the /unction MS, E Tree(V,r) ---> PCPS) is defined by:

• 'Va Eva: MS,(a) = closure({a})

• 'Vn : 1 :0; n : 'Va E Vn : 'Vt" . .. , tn E Tree(V, r) :

M S,(a(t" . .. , tn)) = closure( {a(p" . .. ,Pn) E PS 1 'Vi : 1 :0; i :0; n : Pi E M S,(ti)})

o

The relevance of match sets is stated in the following lemma.

Lemma 3.5 'Vt E Tree(V,r): MS,(t) = {t' E PS 1 t'

=*

t}

Proof: by structural induction over Tree(V, r). 1. base step: let a E Yo,

MS,(a)

(11)

3 TREE ACCEPTORS

o

closure( {a}) = {definition 3.3} {a}U{AENIA~a} = {N ~ PS}{aE PS Aa ~ a} {tfEPSltf~a}

2. induction step: let a E Vn , 1 ::; n, then

MS,(a(t" ... ,tn ))

= {definition 3.4}

closure({a(p" ... ,Pn) E PS

I

Vi: 1::;

i::;

n :Pi E MS,(ti)})

= {induction hypothesis}

closure( {a(p" ... ,Pn) E PS

I

Vi : 1 ::; i ::; n : Pi E {tf E PS

I

tf ~ ti}}) {set calculus}

closure( {a(p" ... ,Pn) E PS

I

Vi: 1 ::; i ::; n: Pi ~ t;})

= {definition 2.4}

c!osure({a(p""',Pn) E PS

I

a(p" ... ,Pn) ~ a(t" ... ,tn)})

= {definition 3.3} {"Is E P(PS): closure(s) ~ PS} {tf E PS

I

tf ~ a(t" ... ,tn)}

Lemma 3.6 "It E Tree(V,r): S E MS,(t) ¢} t E £(S)

Proof: use definition 2.5 and lemma 3.5. 0

8

So, by computing M S,(t) for a given tree t, it is rather simple to decide whether t belongs to the language generated by G or not.

Example 3.7

Consider the grammar of example 2.6. Its pattern set is : PS = {a(b(c),E), b(c), c, E, a(E,d), d, beE), A}.

For t = a(b(c), b(a(d, d))) bottom-up computation of M S,(t) is depicted in figure 2, where each node of t is annotated with the match set of the tree rooted at that node.

It shows that A E MS,(t), hence (see lemma 3.6), t E £(A).

o

Notational remark: elements of a match set added by a closure operation are separated from other elements by a semicolon.

(12)

4 TABULATION OF MATCH SETS 9 a

/""

b b {a(b(c),B);A,B} {b(c),b(B);B} {b(B); B}

I

{c;A,B} a

/""

d d c {a(B,d);A,B} {d;B} {d;B}

Figure 2: An example of dynamic computation of match sets

4 Tabulation of match sets

A program computing M SI is easy implementable, but very inefficient. At each determination of the acceptance of a tree, match sets (and closures) must be recalculated. Fortunately, since P(PS) is a finite set (due to the fact that PS is finite) the number of match sets is finite. This gives the possibility of tabulation of match sets. Observe that M SI (a( tl , . .. , tn )) is of the form

fa(MSl (tr), ... ,MSl (tn)), where fa is defined as follows.

Definition 4.1 {fa, transition function for symbol a }

o

• "TIa E Vo: fa E P(PS), where fa

=

closure({a})

• "TIn: 1 :::; n:"TIa E Vn : "TIsl, ... ,sn E P(PS): fa E p(Ps)n ---+ P(PS), where

fa(sl, ... ,sn) = closure({a(Pl, ... ,Pn) E PS l"TIi: 1:::; i:::; n :Pi E s;})

Obviously all fa's have a finite domain (since the number of match sets is finite), so we may tabulate fa. This means that we will have an n-dimensional table for a symbol of rank n. We do not have to tabulate fa for the entire powerset P(PS), but only for its reachable part, i.e. the smallest set Z <;; P(PS) closed under all fa's.

To compute the reachable part Z and the tabulation of fa, ta E zn ---+ Z, where n

=

r(a), the

standard reach ability algorithm (see e.g. [Rem]) is used. This leads to the following algorithm, which is called AI.

(13)

4 TABULATION OF MATCH SETS

I[

var x,y: PCPS)

II

1 Z, W,G:=

0,

PCPS),

0

; for all a E Vo

do y:= fa; W,G:= W\ {y},Gu {y}; Ta:= yod ; do G

of

0

- + X :E G od ; for all a E V \ Vo do

I[var

n:N 1 n := rea) ; for all (Sb' ., , sn) E (Z U {x})n \ zn do y:=fa(sl, ... ,Sn)

;ify E W - + W,G:= W\ {y},GU {y}

I

y

rt

W - + skip fi

;T

a(sl, ... ,Sn):= y od

II

od ;G,Z:=G\{x},ZU{x} 10

To allow indexing of transition tables with numbers rather than match sets, we introduce an enumeration E E N -+p Z of match sets2 • E is an injection. The definition of transition tables is stated in terms of the enumeration as follows.

Definition 4.2 {Ta, transition table for symbol a }

o

• Va E Vo : Ta E dom(E), where E(Ta) = fa

• Vn: 1 ~ n: Va E Vn : Vs1 , ••• ,Sn E dom(E): Ta E dom(E)n - + dom(E), where

E(Ta(SI, ... ,Sn» = fa(E(SI), ... ,E(sn»

The corresponding definition of match set is now.

(14)

4 TABULATION OF MATCH SETS 11

Definition 4.3 { M S2, match set}

o the function M S2 E Tree(V, r) ---+ dom(E) is defined by :

o Va E Vo : MS2(a)

=

T.

o 'In : 1

S

n : Va E Vn : Vt1, . .. , tn E Tree(V, r) : M S2(a(t1" .. , tn)) = _{T.(M S2(t 1), .. . , M S2(t}n ))

o

The correspondence between M Sl and M S2 is stated in the following lemma. Lemma 4.4 'It E _{Tree(V,r) : MS1}(t) = E(MS2(t))

Proof: by structural induction over Tree(V, r), using definitions 3.4 and 4.1-4.3. 0

Match sets containing the start symbol S have a special meaning (see lemma 3.6) and are called accepting states.

Definition 4.5 { F, accepting states} F = {n E dom(E)

I

S E E(n)}

o

Lemma 4.6 'It E Tree(V, r) : t E C( S) {} M S2( t) E F Proof: use definition 4.5 and lemmata 4.4 and 3.6. 0

As can be observed from definition 3.3, taking the closure of a set of patterns just consists of deriving some left-hand sides of production rules. This calculation can be simplified by tabulating the closure of nonterminals in a table N closure. Formally :

Definition 4.7 { Nclosure }

o N cl osure E N ---+ P( N)

o VA EN: Nclosure(A) = {B E NIB ~ A}

o

The Nclosure of a nonterminal is nothing more than determining the reflexive and transitive closure of P

n

N X N. We use Warshall's algorithm to calculate the transitive closure.

The relation between closure and N closure is stated in lemma 4.8.

Lemma 4.8 'Is E P(PS): closure(s) = s U (UA,a

I

(A...., a) EPA a E s: Nclosure(A)) Proof: use definitions 3.3, 2.4, and 4.7. 0

The elaborated version of algorithm Al (named A2) is presented below. Here, the sets Z,G, and Ware characterized by {E(i) lOS i

<

p},{E(i)

Ips

i

<

q}, and P(PS)\(ZUG), respectively.

(15)

4 TABULATION OF MATCH SETS

I[

con G = (N, V, r, P, S) : tree grammar ;var E:

N -->p

PCPS) F: peN) p,q:N

T.:N

T.

:Nn-->N

Nclosure: N --> peN)

; proe compute_N closure =

I[ I

(* first, tmnsitive closure by means of Warshall's algorithm *) for all A E N do Nclosure(A) := {B E N

I

(B -+ A) E P} od

; for all BEN do for all A E N

12

for all a E Va for all n : 1 :::; n : a E Vn

do if BE Nclosure(A) --> Nclosure(A):= Nclosure(A) U Nclosure(B)

I

Bit Nclosure(A) --> skip

:Ii

od

od (* second, reflexive closure *)

; for all A E N do Nclosure(A):= Nclosure(A) U {A} od

11 ; fune closure = (1 s : PCPS)

I

PCPS)

I[

var r : PCPS) 11 )

I

r:= s ; for all (A -+ a) E P do if a E s --> r := rUN closure(A)

I

a

It

s --> skip od

Ir

:Ii (* main program *)

I

compute_N closureO ;q:=O;p:=O ; for all a E Va

do E(q):= closure({a}) (* new match set *) ;T.,q:= q,q

+

1

(16)

4 TABULATION OF MATCH SETS

II

od ; do p

i'

q ---+ for all a E V \ Vo od do I[var n:N 1 n := rea)

; for all (Pl, ... ,Pn) E {O, ... ,p}n \ {O, ... ,p-1}n

do E(q):= closure({a(tl, ... ,tn ) E PS 1 Vi: 1':; i':; n :ti E E(Pi)}) ; I[var k:N

od

II

od

;p:=p+1

1 k:= 0 ;do E(k)

#

E(q) ---+ k:= k

+

1 od

; if k = q ---+ q := q

+

1 (* new match set *)

I

k

i'

q ---+ skip fi

; Ta(Pl, ... ,Pn) := k

11

; P := 0 ; F :=

0

(* determine accepting states *) ;do Pi' q ---+ if S E E(p) ---+ F:= F U {p}

I

S

rf.

E(p) ---+ skip fi

;p:=p+1 od

13

Given the transition tables, the acceptance problem is easily solved: some simple table look-ups do the job. The table-driven acceptor is described by the following algorithm. Time complexity of this program is proportional to the size of the tree.

(17)

5 OPTIMIZED TABULATION

I[

con F : P(N) Ta

:N

11

T. :Nn ---; N

t : Tree(V, r) ; var accepted: bool jfunc mS2

==

(1 t: Tree(V,r)

I

N lift:: a - Ta

)

It:: a(h,. " ,tn) - Ta(ms2(t,), ... , mS2(tn))

fi

I

accepted := mS2(t) E F Example 4.9 14 for all a E Vo for all n : 1

:s:

n : a E Vn

Consider the grammar of example 2.6. The tables generated by algorithm A2 are : \ E \ match set \ Ta

I

0

I

1 \ 2

I

3

I

4 \

5

I

6 \ 1

I

0 {c;A,B} 0 2 3 2 2 2 2 2 2 0

₄

1 {d;B} 1 2 3 2 2 2 2 2 2 1 5 2

₀

2 2 2 2 2 2 2 2 2 2 2 3 {a(B,d);A,B} 3 2 3 2 2 2 2 2 2 3 5

4

{b(c), b(B); B}

4

6 7 2 6 6 6 6 6

4

5 5 {b(B); B} 5 2 3 2 2 2 2 2 2 5 5 6 {a(b(c),B);A,B} 6 2 3 2 2 2 2 2 2 6 5 7 {a(B, d), a(b(c), B); A, B} 7 2 3 2 2 2 2 2 2 7 5 Tc

=

0, Td = 1; F

=

{0,3,6, 7}

Table-driven bottom-up accepting of an input tree, for instance t = a(b(c),b(a(d, d))) proceeds as demonstrated in figure 3. Each node of t is now annotated with a number corresponding to the match set as it was annotated with in figure 2.

Since 6 E F, t E £(G).

o

5 Optimized tabulation

In practice, code generation tree grammars (see introduction) are rather extensive. This means that transition tables may be very large. Compression of these tables can be applied after

(18)

COffi-5 OPTIMIZED TABULATION 15 a

/""

b b Ta(4,5)=6 n(O) = 4

I

Tc = 0 c a

/""

Ta(l, 1) = 3 d d

Figure 3: An example of table driven acceptance

putation of them, but uncompressed tables may be so large that they cannot be generated, even if the compressed tables would be of manageable size. However, a considerable improvement is possible.

The optimization is based on an equivalence relation on match sets. The basic idea of the equivalence relation is the observation that some patterns only occur as j-th subtree of a tree labelled with a symbol of rank n (n ::0: j). Main advantage is that with generation of match sets one can iterate over the equivalence classes instead of the match sets. This is quite lucrative, provided that the mapping of match sets on equivalence classes is not (nearly) a bijection (then no improvement is made).

The idea of this optimization originated with David Chase, but he only gave an informal treat-ment of his ideas (see [Chase]). Here, we derive an improved algorithm for the generation of match sets based on the material presented in [Chase].

The j-th childset of a symbol, say a, 1 ::; j ::; r(a), is the set of patterns that appear as j.th su btree of a tree in P S labelled with a. Formally:

Definition 5.1 { GSa,j, j-th childset of symbol a }

'Vn: 1::; n : 'Va E Vn: 'Vj: l::;j::; n :'Vt}, ... ,tn E Tree(N UV,r): GSa,j = {t E Tree(NU V,r) 13t' E PS: t':: a(t}, ... ,tnlll tj = t}

o

Example 5,2

The childsets of the symbols of the gmmmar of example 2.6 are : GSa,l = {b(c), B}, GSa,2 =

{B, d}, and GSb,} = {c, B}.

o

Lemma 5.3 'Vn: 1 ::; n: 'Va E Vn : 'Vj : 1 ::; j ::; n: GSa,j <;; PS Proof: PSis closed under taking subtrees. 0

(19)

5 OPTIMIZED TABULATION 16

U sing the childsets we may refine the definition of match set as follows. Definition 5.4 { _{MSl , match set}}

• the function M Sl E Tree(V, r) _ P( P S) is defined by :

• '<Ia E Vo : M Sl(a) = closure({a})

• '<In : 1 ::; n : '<Ia E Vn : '<It 1 , ..• ,tn E Tree(V, r) :

M Sl(a(tl, . .. , tn)) = closure({ a(Pl, ... ,Pn) E PSI '<Ii : 1 ::; i ::; n : Pi E M S,(ti) n GSa,;})

o

The only difference with definition 3.4 is the intersection with G Sa,i. This does not affect the value of M Sl (a(tl, .. . , tn)), because the only patterns missing from M S,(t;) n GSa,; are those patterns that do not appear as the i-th subtree of a tree in P S labelled with a.

For the same reasons as mentioned in section 4 a program computing M S, is easy implementable, but rather inefficient. Again, tabulation is possible. Observe that M Sl(a(t" . .. , tn)) is now of the form fa(ga,l(M S,(t1)), ... ,ga,n(M Sl(t n))) where fa is defined as before (see definition 4.1) and ga,j is defined as follows.

Definition 5.5 {ga,j, map function for j-th child of symbol a } '<In : 1 ::; n : '<Ia E Vn : '<Ij : 1 ::; j ::; n :

• the function ga,j E P( P S) _ P( P S), is defined by :

• '<Is E PCPS) : ga,j(S) = S n GSa,j

o

For practical reasons we use an enumeration E EN _ p PCPS) of match sets. The definition

of transition tables is stated in terms of the map function and the enumeration E as follows. Definition 5.6 {Ta, transition table for symbol a }

o

• Va EVa: Ta E dom(E), where E(Ta) = fa

• '<In: 1::; n: '<Ia E Vn: '<lSI,." ,sn E dom(E): Ta E dom(E)n _ dom(E), where

E(Ta(Sl,'" ,sn)) = Ia(ga,l(E(sI»), ... ,ga,n(E(sn)))

Apparently, the transition tables are similar with those defined in definition 4.2.

An important observation is that intersection of a match set with some childset G Sa,j, for some symbol a E Vn , 1 ::; nand j : 1 ::; j ::; n, induces an equivalence relation over match sets.

(20)

5 OPTIMIZED TABULATION

Definition 5.7 { equivalence relation ""a,j }

'Vs,s' E PCPS) : 'Vn: 1 :<; n: 'Va E Vn : 'Vj : 1 :<; j:<; n: s ""a,j S' ¢} s

n

GSa,j = s'

n

GSa.j

o

Definition 5.8 { equivalence class f"'a,j }

'Vs E P(PS): 'Vn: 1 :<; n: 'Va E Vn : 'Vj: 1 :<; j:<; n: f",a,i(s) = {s' E PCPS)

I

s ""a,j s'}

o

17

In other words: the equivalence class f",a,i( s) is the set of all match sets that are equivalent (under ""a.j) to s. An equivalence class f"'a,j(S) is represented by s

n

GSa,j, which is called the representer-set of f",a,i( s).

Tabulation of representer-sets is possible, since the number of equivalence classes is finite. We introduce for all n : 1 :<; n, for all _{a E Vn , for}all j : 1 :<; j :<; n an enumeration Ra,i E N -->p

PCP S) ofrepresenter-sets. The mapping of (the enumeration of) match sets on (the enumeration of) representer-sets is performed by an index map table J1a,i'

Definition 5.9 { J1a,i, index map table for j-th child of symbol a} 'Vn : 1 :<; n : 'Va E Vn : 'Vj : 1 :<; j :<; n :

o the function J1a,j E dome E) --> dome Ra,j) is defined by :

o 'Vs E dom(E) : R a,j(J1a,j(s))

=

E(s)

n

GSa,i

o

Notice that J1a,i is, in fact, nothing else than the tabulation of ga.j, defined above. This means that transitions have to be tabulated for representer-sets only (instead of match sets). This is reflected in the following definition.

Definition 5.10 { T~, transition table for symbol a }

o

o 'Va E Va : T~ E dom(E), where E(T~) = fa

o 'Vn: 1:<; n: 'Va E Vn: 'Vj: 1:<; j:<; n: 'Vsl , ... ,Sn E dom(E):

- J1a,; E dom(E) --> dom(Ra,j)

- T~ E (dom(Ra,Il X .,. X dom(Ra,n)) --> dom(E)

- E(T~(J1a,I(SI)"" ,J1a,n(sn)))

=

fa(ga,I(E(SI)), ... ,ga,n(E(sn)))

The corresponding definition of match set is changed into : Definition 5.11 { M S2, match set}

(21)

5 OPTIMIZED TABULATION 18

• Va EVa: M S2(a)

=

T~

• 'In: 1 ::; n: Va E Vn : Vtl,'" ,tn E Tree(V,r):

M S2(a(tl" .. , t n» = T~(!la,l(M S2 (tt), ... ,!la,n(M S2(tn»)

o

Notice that M S2 is still related to M Sl by lemma 4.4.

U sing the definitions above and the invariants given below we may derive a tabulation algorithm (named A3 ), which is presented here. First, we give the invariants of the program.

PI: ('In: 1 ::; n : Va E _{Vn : newa}= (3j : 1 ::; j ::; n : Pa,j

<

qa,j»

P2: 0::; P ::; q 1\ ('In: 1 ::; n : Va E Vn: (Vj: 1 ::; j ::; n : 0 ::; Pa,j ::; qa,j

1\ (Vi: 0 ::; i

<

q 1\ 0 ::; !la,j( i)

<

qa,j : Ra,j(!la,j( i» = E( i)

n

C Sa,j»

1\ (ViI, ... ,in : V j : 1 ::; j ::; n : 0 ::; ij

<

P 1\ 0 ::; !la,j ( ij)

<

Pa,j

1\

0::;

T~(!la,l(il)"" ,!la,n(in»

<

q:

E(T~(!la,l(il), ... ,!la,n(in») = fa(ga,l(E(i l

», ...

,ga,n(E(in)))))

I[

eon G = (N, V, r, P, S) : tree grammar ; var E :

N

---+p PCPS) F :P(N) T~ :N, T~

:Nn

---+

N

!la,j :

N

---+

N

Ra,j :

N

---+p PCPS) Pa,j, qa,j :

N

CSa,j : PCPS) newa : bool p,q:N Nclosure: N ---+ peN)

; proe compute_N closure (* see algorithm A2 *) ; fune closure (* see algorithm A2 *)

; proe compute..childsets =

I[

var j,n:

N

I

for all a E V \ Va for all a E

Va

for all n : 1 ::; n : a E Vn for all n : 1 ::; n : a E Vn : j : 1 ::; j ::; n for all n : 1 ::; n : a E Vn : j : 1 ::; j ::; n for all n : 1 ::; n : a E Vn : j : 1 ::; j ::; n for all n : 1 ::; n : a E Vn : j : 1 ::; j ::; n for all n : 1 ::; n : a E Vn

(22)

do j,n:= l,r(a)

; do j

fc

n

+

1 ----t C S.,j :=

0 ;

j := j

+

1 od od

; for all a(tl, ... ,tn ) E PS do j:= 1 ;do j

fc

n

+

1 ----t CS.,j:= CS.,j U {tj} od od

II

; proc compute_reprsets = (lp:N

III

(* compute representer-sets of matchset E(p) for all a E V \

Va

*) (* and fill fl.,j-tables, 1 :$ j :$ r( a), for E(p) *)

for all a E V \ Va do

I[ var

j, n :

N

I

j,n:= l,r(a)

;do j

fc

n

+

1 ----tR.,j(qaJ):= E(p)

n

CS.,j

; I[var

k:N

II

) od od ]I

(*

main program *)

I

compute-N closureO ; compute_childsetsO ; for all a E V \ Va do

I[

var j, n : N

I

k:= 0 ;do Ra,j(k)

fc

R.,j(q.,j) ----t k:= k

+

1 od ; if k

fc

q.,j ----t skip

I

k = q.,j ----t (* new representer-set *)

qa,j, new. := q.,j

+

1, true

:Ii ; fl.,iCP) := k ]I ;j:= j

+

1

I

j, n := 1, r( a) ; do j

fc

n

+

1 ----t q.,j := 0 ; P.,j := 0 ; j := j

+

1 od ]I ; new. := false 19

(23)

od

;q:=O;p:=O ; for all a E Vo

do E(q) := closure( {a})

;T~,q:=q,q+l od

; do P'l q - compute_reprsets(p) ;p := p

+

1 od

; do (3a E V \

Va :

newa)

_for all _{(a E V \ Vo : newa)} do

I[

var j, n :

N

1 n := rea)

; for all (PI, ... ,Pn) E {O .. qa,l - I} X ... X {O .. qa,n - I} \

20

{O"Pa,1 - I} X .•• X {O .. Pa,n - I} do E(q):= closure({a(tl, ... ,tn) E PS 1 Vi: 1::; i::; n: t; E Ra,;(p;)})

11 od od od I[var k:N 11

1 k := 0 ;do E(k)

'I

E(q) - k := k

+

1 od

; if k = q _ q := q

+

1 (_new match set *)

I

k'l q - skip

fi

; T~(pI"" ,Pn) := k

;j := 1 ;do j

'I

n

+

1 - Pa,j := qa,j ;j:= j

+

1 od

11

; newa := false

; do P

'I

q - compute_reprsets(p) ; p := p

+

1 od

; p := 0 ; F :=

0 (_

determine accepting states *) ; do p

'I

q - if S E E(p) _ F := F U {p}

I

S

fit

E(p) _ skip

fi

;p:= p+ 1

od

(24)

5 OPTIMIZED TABULATION 21 I[ con F : peN) T~:N T~ :Nn---+N I'a,j :

N

---+

N

t : Tree(V, r)

for all a

EVa

for all n : 1 ::; n : a E Vn for all n : 1 ::; n : a E Vn : j : 1 ::; j ::; n

II

; var accepted: bool ;func mS2

=

(t t : Tree(V, r) 1

N

1 if t :: a ---+ T~

)

It:: a(t}, ... ,tn ) ---+ T~(!la,1(ms2(t1))'''' ,!la,n(ms2(tn)))

:fi

1 accepted := mS2(t) E F

Compare the results given in the following example with those of example 4.9. Example 5.12

Consider again OUT running example and consider the childsets of our grammar as given in

example 5.2. The tables generated by algorithm A3 are:

I

E

I

match set

I

I'a 1

I

I'a 2

I

I'b

,

1

I

0 {c;A,B} 0 0 0 1 {d;B} 0 1 1 2

0

1 2 2 3 {a(B,d);A,B} 0 0 1

4

{b(c),b(B);B} 2 0 1 5 {b(B); B} 0 0 1 0 2 3 2 1 2 2 2 2 6 7 2 6 {a(b(c),B);A,B} 0 0 1

7 {a(B, d), a(b( c), B); A,B} 0 0 1

I

Ra,1

I

Ra,2

I

Rb,1

I

o

{B}

o

{B}

o

{c,B}

1

0

1 {d,B} 1 {B}

2 {be c), B}

2

0

2

0

T~

=

0, T~ = 1; F

=

{0,3,6, 7}

For example, take E(4), which equals {b(c),b(B),B}.

(25)

6 CONCLUDING REMARKS

E(4)

n

CSa,2

=

E(4)

n

{B,d}

=

{B}

=

R.,2(0), so l'a,2(4)

=

0

E( 4)

n

C Sb,1

=

E( 4)

n

{c, B}

=

{B}

=

Rb,1 (1), so I'b,1 (4)

=

1

Bottom-up accepting oft = a(b(c),b(a(d,d))) proceeds now as depicted in figure

4.

T~

=

0

T,j = 1

Figure 4: Example of matching using compressed tables Since 6 E F, t E L(G).

o

6 Concluding remarks

22

We derived a rather efficient tabulation algorithm for table-driven bottom-up tree acceptors. The base idea for the optimized tabulation is quite simple, nevertheless it leads to a complex algorithm. The presented algorithms are derived by step-wise refinement : starting with the well-known reachability algorithm we elaborate this towards an algorithm for the efficient gen-eration of compressed parse tables. Our opinion is that such a systematical derivation gives us more insight in a complex algorithm.

Experiments with an implementation (in Pascal) of the algorithm have demonstrated a con-siderable improvement in table generation. For example, a code generation grammar with 33 production rules (representing a part of the Intel 8085 instruction set) gave an improvement of 9208 table entries to only 635, of which 468 index map table entries. Index map tables take up most of the space, but traditional compression techniques can be used to reduce that space since index maps are inherently sparse.

Although the tabulation algorithm has an exponential time complexity we believe that for code generation grammars the mapping of match sets on equivalence classes is such that a passable improvement is made. Though much work remains to be done in the field of universal code generator-generators, we hope to have made a contribution to bottom-up parsing algorithms.

(26)

REFERENCES 23

Acknowledgements

Thanks are due to Huub ten Eikelder, Berry Schoenmakers and Pieter Struik (all of the Eind-hoven University of Technology) for reading and commenting on draft versions of this paper.

References

[Aho]

[Brainerd]

A. V. Aho, R. Sethi & J. D. Ullman. Compilers-Principles, Techniques, and Tools. Addison-Wesley, Reading, Mass., 1986.

W. S. Brainerd. Tree Generating Regular Systems. Information and Control, vo1.14, pp.217-231, (1969).

[Chase] D. R. Chase. An Improvement to Bottom-up Tree Pattern Matching. Proc. of the 14th ACM Conf. on Principles of Programming Languages, pp.168-177, (1987). [Christopher a] T. W. Christopher, P. J. Hatcher & R. C. Kukuk. Using Dynamic Programming to Generate Optimized Code in a Graham-Glanville Style Code Generator. Proc. ofthe ACM 1984 Symp. on Compiler Construction, ACM SIGPLAN NOTICES, vo1.19 no.6, pp.25-36, (1984).

[Christopher b] T. W. Christopher & P. J. Hatcher. High-Quality Code Generation Via Bottom-up Tree Pattern Matching. Proc. of the 13th ACM Conf. on Principles of Pro-gramming Languages,pp.119-130, (1986).

[Doner] J. Doner. Tree Acceptors and Some of Their Applications. Computing System Science, vol.4, pp.406-451, (1970).

[Graham] S. L. Graham & R. S. Glanville. A New Method For Compiler Code Generation. Proc. of the 5th ACM Conf. on Principles of Programming Languages, pp.231-240, (1978).

[Hemerik] C. Hemerik & Y. M. van Dinther. Acceptors and Parsers for Regular Tree Gram-mars. In preparation.

[Hoffmann]

[Turner]

[Rem]

C. A. Hoffmann & M. J. O'Donnell. Pattern Matching in Trees. Journal of the ACM, vol.29 no.l, pp.68-95, (1982).

P. K. Turner. Up-down Parsing with Prefix Grammars. ACM SIGPLAN Notices, vol.21 no.12, pp.167-174, (1986).

M. Rem. Small Programming Exercises 5. Science of Computer Programming, vol.4 no.3, (1984).

(27)

REFERENCES

24

[Rounds] W. C. Rounds. Mappings and Grammars on Trees. Math. Syst. Theory, vol.4, pp.257-287, (1970).

[Van Dinther] Y. M. van Dinther. The Systematic Derivation of Acceptors and Parsers for

Tree Grammars. Master's Thesis, Eindhoven University of Technology, 1987.

(28)

In this series appeared ;

No. Author(s) Title

85/01 R.H. Mak The formal specification and

derivation of CMOS-circuits 85/02 W.M.C.J. van Overveld On arithmetic operations with

M-out-of-N-codes

85/03 W.J.M. Lemmens Use of a computer for evaluation of flow films

85/04 T. Verhoeff Delay insensitive directed trace H.M.J.L. Schols structures satisfy the foam

rubber wrapper postulate

86/01 R. Koymans Specifying message passing and

real-time systems

86/02 G.A. Bussing ELISA, A language for formal

K.M. vanHee specifications of information

M. Voorhoeve systems

86/03 Rob Hoogerwoord Some reflections on the implementation of trace structures

86/04 G.J. Houben The partition of an information J. Paredaens system in several parallel systems K.M. vanHee

86/05 Jan L.G. Dietz A framework for the conceptual Kees M. van Hee modeling of discrete dynamic systems

86/06 Tom Verhoeff Nondeterminism and divergence

created by concealment in CSP

86/07 R. Gerth On proving communication

L.Shira closedness of distributed layers

86/08 R. Koymans Compositional semantics for

R.K. Shyamasundar real-time distributed

W.P. de Roever computing (Inf.&Control 1987) R. Gerth

S. Arun Kumar

86/09

C.

Huizing Full abstraction of a real-time

R. Gerth denotational semantics for an

W.P. de Roever OCCAM-like language

86/10 J. Hooman A compositional proof theory

for real-time distributed message passing

86!1l

W.P. de Roever Questions to Robin Milner - A responder's commentary (IFIP86)

86/12 A. Boucher A timed failures model for

(29)

86/13 R Gerth W.P. de Roever 86/14 R Koymans 87/01 R Gerth 87/02 Simon J. Klaver Chris F.M. Verberne 87/03 G.J. Houben J.Paredaens 87/04 T.Verhoeff 87/05 RKuiper 87/06 RKoymans 87/07 RKoymans 87/08 H.M.J.L. Schols 87/09 J. Kalisvaart L.RA. Kessener W.J.M. Lemmens M.L.P. van Lierop F.J. Peters H.M.M. van de Wetering 87/10 T.Verhoeff 87/11 P.Lemmens

87/12 K.M. van Hee and A.Lapinski

87/13 J.C.S.P. van der Woude

87/14 J. Hooman

Proving monitors revisited: a fIrst step towards verifying object oriented systems (Fund. Informatica IX-4)

Specifying passing systems requires extending temporal logic On the existence of sound and complete axiomatizations of the monitor concept

Federatieve Databases

A formal approach to distri-buted information systems Delayinsensitive codes -An overview

Enforcing non-determinism via

linear time temporal logic specification. Temporele logica specifIcatie van message passing en real-time systemen (in Dutch). Specifying message passing and real-time systems with real-time temporal logic. The maximum number of states after projection.

Language extensions to study structures for raster graphics.

Three families of maximally nondeter-ministic automata.

Eldorado ins and outs.

SpecifIcations of a data base management toolkit according to the functional model. OR and AI approaches to decision support systems.

Playing with patterns, searching for strings.

A compositional proof system for an occam-like real-time language

(30)

87/15 C. Huizing A compositional semantics for statecharts R.Gerth

W.P. de Roever

87/16 H.M.M. ten Eikelder Normal forms for a class of formulas J.C.F. Wilmont

87/17 KM. vanHee Modelling of discrete dynamic systems

G.-J.Houben framework and examples

J.L.G. Dietz

87/18 C.W.A.M. van Overveld An integer algorithm for rendering curved surfaces

87/19 A.J.Seebregts Optimalisering van file allocatie in gedistribueerde database systemen

87/20 G.J. Houben The R2 -Algebra: An extension of an

1. Paredaens algebra for nested relations

87/21 R. Gerth Fully abstract denotational semantics

M. Codish for concurrent PROLOG

Y. Lichtenstein E. Shapiro

88/01 T. Verhoeff A Parallel Program That Generates the

Mobius Sequence

88/02 KM. vanHee Executable Specification for Information

G.J. Houben Systems

L.J. Somers M. V oorhoeve

88/03 T. Verhoeff Settling a Question about Pythagorean Triples 88/04 G.l. Houben The Nested Relational Algebra: A Tool to handle

J.Paredaens Structured Information

D.Tahon

88/05 KM. vanHee Executable Specifications for Information Systems

OJ.

Houben

L.J. Somers M. Voorhoeve

88/06 H.M.J.L. Schols Notes on Delay-Insensitive Communication 88/07 C. Huizing Modelling Statecharts behaviour in a fully

R. Gerth abstract way

W.P. de Roever

88/08 KM. vanHee A Formal model for System Specification

G.l. Houben L.l. Somers M. Voorhoeve

88/09 A.T.M. Aerts A Tutorial for Data Modelling KM. vanHee

(31)

88/10 J.C. Ebergen 88/11 G.J. Houben J.Paredaens 88/12 A.E. Eiben 88/13 A. Bijlsma 88/14 H.M.M. ten Eikelder R.H. Mak: 88/15 R.Bos C. Hemerik 88/16 C.Hemerik lP.Katoen 88/17 K.M. vanHee G.J. Houben L.J. Somers M. Voorhoeve

A Fonnal Approach to Designing Delay Insensitive Circuits

A graphical interface fonnalism: specifying nested relational databases

Abstract theory of planning

A unified approach to sequences, bags, and trees Language theory of a lambda-calculus with recursive types

An introduction to the category theoretic solution of recursive domain equations

Bottom-up tree acceptors

Executable specifications for discrete event systems