• No results found

Reducing Behavioural to Structural Properties of Programs with Procedures

N/A
N/A
Protected

Academic year: 2021

Share "Reducing Behavioural to Structural Properties of Programs with Procedures"

Copied!
35
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Contents lists available atSciVerse ScienceDirect

Theoretical Computer Science

journal homepage:www.elsevier.com/locate/tcs

Reducing behavioural to structural properties of programs

with procedures

Dilian Gurov

a

, Marieke Huisman

b,∗

aKTH Royal Institute of Technology, Stockholm, Sweden

bUniversity of Twente, Netherlands

a r t i c l e i n f o

Article history: Received 29 June 2011

Received in revised form 29 October 2012 Accepted 1 February 2013 Communicated by M. Hofmann Keywords: Compositional reasoning Control-flow behaviour Control-flow structure Modalµ-calculus Program verification Safety properties

a b s t r a c t

There is an intimate link between program structure and behaviour. Exploiting this link to phrase program correctness problems in terms of the structural properties of a program graph rather than in terms of its unfoldings is a useful strategy for making analyses more tractable. The present paper presents a characterisation of behavioural program properties through sets of structural properties by means of a translation. The characterisation is given in the context of a program model based on control flow graphs of sequential programs with procedures, abstracting away completely from program data, and properties expressed in a fragment of the modalµ-calculus with boxes and greatest fixed-points only. The property translation is based on a tableau construction that conceptually amounts to symbolic execution of the behavioural formula, collecting structural constraints along the way. By keeping track of the subformulae that have been examined, recursion in the structural constraints can be identified and captured by fixed-point formulae. The tableau construction terminates, and the characterisation is exact, i.e., the translation is sound and complete. A prototype implementation has been developed. In addition, we show how the translation can be extended beyond the basic flow graph model and safety logic to richer behavioural models (such as open programs) and richer program models (including Boolean programs), and discuss possible extensions for more complex logics. We present several applications of the characterisation, in particular sound and complete compositional verification for behavioural properties based on maximal models. © 2013 Elsevier B.V. All rights reserved.

1. Introduction

The relationship between a program’s syntactical structure and its behaviour is fundamental in program analysis. For example, type systems analyse the structure of a program to deduce properties about its behaviour, while program synthesis studies how to realise a program structure for a desired program behaviour. The relationship is often exploited to phrase program correctness problems in terms of the structure of a program rather than in terms of its behaviour, in order to make analyses more tractable. If program data is abstracted away, and only the control flow of programs with (possibly recursive) procedures is considered, the relation between structure and behaviour is well-understood in one direction: program structure, essentially a finite ‘‘program graph’’, can be represented by a pushdown system that induces program behaviour as an ‘‘unfolding’’ of the structure in a context-free manner. This representation has been exploited widely, for example for inter-procedural data flow analysis (e.g., in [1]) and for model checking of behavioural properties (e.g., in [2]).

However, in the other direction, this relationship is much less understood: given a program behaviour, how can one capture the program structures that admit this behaviour? A natural way to capture both program structure and behaviour

Corresponding author. Tel.: +31 534894662.

E-mail addresses:dilian@csc.kth.se(D. Gurov),M.Huisman@utwente.nl,marieke.huisman@ewi.utwente.nl(M. Huisman).

0304-3975/$ – see front matter©2013 Elsevier B.V. All rights reserved.

(2)

is the use of temporal logic formulae: structural properties are concerned with the textual sequencing of instructions in a program, while behavioural properties consider their executional sequencing. Then, the relationship between structure and behaviour can be naturally characterised in the two directions through the following questions:

(1) when does a structural property entail a behavioural property, and

(2) can a behavioural property be captured by a finite set of structural properties?

This paper addresses this characterisation problem in the context of a program model based on control flow graphs of sequential programs with procedures (i.e., program data is abstracted away). Properties are expressed in a fragment of the modal

µ

-calculus with boxes and greatest fixed-points only, which is suitable for expressing safety properties (cf. [3]) in terms of sequences of method invocations, such as security policies restricting access to given resources by means of API method calls (cf. [4]). In previous work [5], we showed how this logic can be used for the specification and compositional verification of safety properties, both on the structural and on the behavioural level, and provided tool support and case studies. In particular, we derived an algorithmic solution to problem (1) stated above (see [5, p. 855]). Here, we give a precise solution to the (more complex) problem (2), showing that every disjunction-free behavioural formula can be precisely characterised by a finite set of structural formulae: a program satisfies the behavioural formula if and only if it satisfies

some structural formula from the set. For example, using the results of this paper, one can derive that the behavioural

property ‘‘method a never calls method b’’ is characterised precisely by the following two structural properties: ‘‘there is no call-to-b instruction in (the text of) method a’’ or ‘‘in (the text of) method a, every call-to-b instruction and every return instruction is preceded by some call-to-a instruction’’ (and hence, due to unbounded recursion, control can never reach a call-to-b instruction).

As mentioned above, the characterisation is defined for safety properties over a program model considering control flow only. This may seem like a severe restriction, but still many useful and realistic properties can be expressed at this level of abstraction, such as for example: no non-atomic methods is called within a Java Card transaction (which is a mechanism to guarantee atomic updates); a method that changes certain sensitive data is only called from within a dedicated authentication method, i.e., unauthorised access is not possible; before program state is being dumped into memory, a serialisation method is called to clean-up the state; in a voting system, candidate selection has to be finished, before the vote can be confirmed; and in a door access control system, the password has to be checked before the door is unlocked, and the password can only be changed if the door is unlocked. But of course, extending the technique with data over finite domains and liveness properties will allow for a wider range of properties and possible applications; therefore we discuss several extensions to our solution of the characterisation problem: we indicate how it could be adapted to the full modal

µ

-calculus, and we discuss how the approach can be extended to Boolean programs, i.e., programs with some limited form of data, programs with exceptions, and open components, i.e., programs that call and receive methods from an external component. The latter extension is essential to apply the characterisation for compositional verification.

Our solution to problem (2) is constructive: we define an explicit translationΠ from behavioural properties into sets of structural properties. The translation has been implemented in Ocaml and can be tested on-line [6]. Conceptually, the translation amounts to a symbolic execution of the behavioural formula, collecting induced structural constraints along the way. A considerable difficulty is presented by (greatest fixed-point) recursion in the behavioural formula, which has to be captured by recursion in the structural ones (in the absence of recursion it is considerably easier to define such a translation, as we show in [7]). We handle recursion by means of a tableau construction that maintains (during the symbolic execution) a symbolic ‘‘call stack’’ indicating which subformulae have been explored for which method. We use this stack to (1) identify when a (sub)formula has been sufficiently explored, so that a branch of the tableau can be finished, and (2) to identify recursion in the collected structural constraints and capture this by fixed-point formulae. We prove that the construction terminates. Moreover, we show that the construction is sound, and in case the behavioural formula is disjunction-free, also

complete, by viewing the tableau system as a proof system.

An alternative solution to problem (2) can be based on the theory of nested words (see for instance [8,9]). Using the results of this theory, a

µ

-calculus formula can be translated into an equivalent formula in NT-

µ

, a fixpoint calculus for nested words, and then in turn be translated into an equivalent alternating parity nested tree automaton. The latter automaton has a structural content that, in principle, can be used as a representation of program structure. In contrast, our solution is ‘‘direct’’, in the sense that our symbolic execution directly follows the operational semantics of the program model, which relates structure with behaviour. This makes our construction easy to adapt for variations and extensions of the model, as we explore in [10]. Furthermore, our tableau construction gives rise to a correctness argument that allows to view a maximal tableau as a proof that the structural formulae resulting from the tableau entail the original behavioural formula.

Applications. In addition to its foundational value, the characterisation is useful in various ways.

In earlier work, we defined a maximal model construction for the logic considered here, and adapted it to the construction of maximal program structures from structural properties [5]. This allows to verify global system properties in a setting where local components might change, vary, or be instantiated later. The combination of this construction with the property translationΠ provides a solution to the problem of computing maximal program structures from behavioural properties. As Section6shows, this can be exploited to extend the compositional verification technique of [5], where local assumptions are structural, to local behavioural properties. This application uses the extension of the property translation to open components, described in Section5.

(3)

Further, the translation can be used to reduce instate verification of behavioural control flow properties to finite-state verification of structural properties. Thus, tools for checking structural properties can in effect be used for verifying behavioural ones. In particular, in a mobile code deployment scheme where the security policies of the platform are given as behavioural control flow properties, translating these into structural properties of the loaded applications enables efficient on-device conformance checking via static analysis. Moreover, the structural properties that are generated from a behavioural safety property allow to identify structural programming patterns that enforce the safety property; this can be useful in program design in the context of given safety policies. This schema is illustrated on a security policy from the Java Card domain in Section7.

The translation can also be used to synthesise program structures with a certain required behaviour. This will in particular be useful once the logic is extended with liveness properties. It is future work to study this in detail.

Organisation. This paper is an extended version of [11]. It contains more examples, detailed proofs, and information about the implementation. Moreover it describes several extensions of the translation, and it discusses the application to compositional verification in greater detail. It is organised as follows. Section2formally defines the program model and logic. Next, Section3defines the translation, by means of the tableau construction, which is proven sound and complete in Section4. Then, Section5discusses a variety of possible extensions to the translation. Next, Section6shows how the characterisation is applied to develop a sound and complete compositional verification principle for local behavioural properties, while Section7shows how the characterisation is applied to verify security policies. Section8outlines the Ocaml implementation of the translation. Finally, Sections9and10conclude with a discussion of related work, other possible extensions and optimisations.

2. Preliminaries: program model and logic

This section summarises the program model and logic for which we develop our property translation. For a more detailed account, the reader is referred to [12,5].

2.1. Specification and logic

First, we define the general notion of model.

Definition 1 (Model). A model is a structureM

=

(

S

,

L

, →,

A

, λ)

, where S is a set of states, L a set of labels,

→⊆

S

×

L

×

S

a labelled transition relation, A a set of atomic propositions, and

λ:

S

P

(

A

)

a valuation, assigning to each state s the set of atomic propositions that hold in s. An initialised model is a pair

(

M

,

E

)

, withMa model and E

S a set of entry states.

As a property specification language, we use the fragment of the modal

µ

-calculus [13] with boxes and greatest fixed-points only. This temporal logic, which we term simulation logic, is capable of characterising simulation (cf. [12,5]) and is thus suitable for expressing safety properties. In our correctness proof, however, we will sometimes fall back on the full modal

µ

-calculus. Therefore, simulation logic is defined as a restriction of the full logic, where negation is restricted to atomic propositions only. Throughout, we fix a set of labels L, a set of atomic propositions A, and a set of propositional variables V . Definition 2 (Logic, Simulation Logic). The formulae of the logic are inductively defined by:

φ ::=

p

|

X

| ¬

φ | φ

1

φ

2

|

φ

1

φ

2

|

[

α

]

φ | ν

X

where p

A,

α ∈

L and X

V . Simulation logic is the fragment with negation over atomic propositions only.

The semantics of the logic is defined in the standard fashion [13], through the notion of denotation

φ∥

M

ρ of a formula relative

to a modelMand environment

ρ

that maps propositional variables to sets of states:

p

M ρ

= {

s

S

|

p

λ(

s

)}

X

M ρ

=

ρ(

X

)

∥¬

φ∥

M ρ

=

S

\∥

φ∥

φ

1

φ

2

M ρ

= ∥

φ

1

∩ ∥

φ

2

φ

1

φ

2

Mρ

= ∥

φ

1

Mρ

∪ ∥

φ

2

Mρ

[

α

]

φ∥

M ρ

=

s

S

| ∀

s

S

.(

s

α s

s

∈ ∥

φ∥

M ρ

)

ν

X

.φ∥

M ρ

=

{

S

S

|

S

⊆ ∥

φ∥

Mρ[S′/X]

}

.

Satisfaction on states

(

M

,

s

) |= φ

(also denoted s

|=

M

φ

) for closed formulae

φ

is then defined as s

∈ ∥

φ∥

Mρ for an arbitrary

ρ

. For instance, formula [

α

]

φ

holds of state s in modelMif

φ

holds in all states accessible from s via an edge labelled

α

. An initialised model

(

M

,

E

)

satisfies a formula

φ

, denoted

(

M

,

E

) |= φ

, if all its entry states E satisfy

φ

. The constant formulae true (denotedtt) and false (ff) are definable. For convenience, we use

φ ⇒ ψ

to abbreviate

¬

φ ∨ ψ

. We assume that formulae have pair-wise distinct fixed-point binders, and unless stated otherwise, are closed,

i.e., all propositional variables are in the scope of a fixed-point binder, and guarded, i.e., every occurrence of a propositional

(4)

If one is only interested in the observable behaviour of a system, one can identify a distinguished silent action

ε ∈

A, and

define weak transitions s

α t in terms of the usual (strong) transitions as follows: s

ε t whenever s

(

ε

)

t, and s

α t

whenever s

ε

α

ε t for all a

̸=

ε

. One can then interpret the box modality of the logic over the weak transitions rather than the strong transitions of models:

(

M

,

s

) |=

w[

α

]

φ

holds if and only if

φ

holds in all states accessible from s via an edge labelled

α

, preceded and followed by an arbitrary number of

ε

-steps. There is, however, a standard translation of formulae interpreted over weak transitions into equivalent formulae interpreted over strong transitions [15]. This translation, let us denote it by

δ

, has the property that

(

M

,

s

) |=

w

φ

exactly when

(

M

,

s

) |= δ(φ)

.

2.2. Control flow structure and behaviour

Our program model is control-flow based and thus over-approximates actual program behaviour. It defines two different views on programs: a structural and a behavioural one. Both views are instantiations of the general notion of model. Notice in particular that these instantiations yield a structural and a behavioural version of the logic. Again, we refer to [12,5] for more detail.

Notice that in this paper we assume that the program model only models a program’s public methods. After extracting a program model from a program, private methods can be removed in a way that preserves the (public) behaviour by using our inlining algorithm, as described in [16,5].

Control flow structure. As we abstract away from all data, program structure is defined as a collection of control flow graphs (or flow graphs), one for each of the program’s methods (Section5.3discusses how data can be added to the program model). Let Meth be a countably infinite set of method names, ranged over by m, i, a, and b. A method specification is an instance of the general notion of initialised model.

Definition 3 (Method Graph). A flow graph for m

Meth over a finite set M

Meth of method names is a finite model Mm

=

(

Vm

,

Lm

, →

m

,

Am

, λ

m

)

, with Vmthe set of control nodes of m, Lm

=

M

∪ {

ε}

, Am

= {

m

,

r

}

, and

λ

m

:

Vm

P

(

Am

)

so that m

λ

m

(v)

for all

v ∈

Vm(i.e., each node is tagged with its method name). The nodes

v ∈

Vmwith r

λ

m

(v)

are return points. A method graph for m

Meth over M is a pair

(

Mm

,

Em

)

, such thatMmis a flow graph for m over M and Em

Vma non-empty set of entry points of m.

Thus, for method graphs, the only atomic propositions that we consider are method name propositions m

M and return

node propositions r.

Next, we define flow graph interfaces. These ensure that control flow graphs can only be composed if their interfaces match.

Definition 4 (Flow Graph Interface). A flow graph interface is a pair I

=

(

I+

,

I

)

, where I+

,

I

Meth are finite sets of

names of provided and (externally) required methods, respectively.1The composition of two interfaces I

1

=

(

I1+

,

I − 1

)

and I2

=

(

I2+

,

I − 2

)

is defined by I1

I2

=

(

I1+

I + 2

,

I − 1

I − 2

(

I + 1

I + 2

))

.

The flow graph of a program is essentially the (disjoint) union of its method graphs. To formally define the notion flow graph

with interface, we use the notion of disjoint union of initialised models

(

M1

,

E1

) ⊎ (

M2

,

E2

)

, where each state is tagged with 1 or 2, respectively, and

(

s

,

i

)

α M1⊎M2

(

t

,

i

)

, for i

∈ {

1

,

2

}

, if and only if s

α

Mit.

Definition 5 (Flow Graph). A flow graphGwith interface I, writtenG

:

I, is defined inductively by

(

Mm

,

Em

) : ({

m

}

,

M

− {

m

}

)

if

(

Mm

,

Em

)

is a method graph for m

Meth over M, and

G1

G2

:

I1

I2ifG1

:

I1andG2

:

I2.

Example 6. Fig. 1shows a simple Java class and the (simplified) flow graph it induces. The flow graph consists of two method graphs — one for method

even

and one for method

odd

. Entry nodes are depicted as usual through edges without source.

A flow graph is closed if I

=

∅, i.e., it does not require any external methods.

Satisfaction, instantiated to flow graphs, is called structural satisfaction

|=

s, i.e.,G

|=

s

χ ⇔

G

|=

χ

.

Example 7. Given the flow graph inExample 6, the structural formula

ν

X

.

[

even

] r

[

odd

] r

[

ε

] X expresses the property ‘‘on every path from a program entry node, the first encountered call edge goes to a return node’’, in effect specifying that the program is tail-recursive.

1 We only require I

to contain the methods that are not provided by I+

. This is different from the definitions in our earlier work (e.g., [5]), but in line

(5)

Fig. 1. A simple Java class and its flow graph.

Control flow behaviour. Next, we instantiate initialised models on the behavioural level. We use transition label

τ

to designate internal transfer of control, label m1callm2for the invocation of method m2by method m1, and label m2retm1 for the corresponding return from the call.

Definition 8 (Behaviour). LetG

=

(

M

,

E

) :

I be a closed flow graph whereM

=

(

V

,

L

, →,

A

, λ)

. The behaviour of

Gis defined as model b

(

G

) = (

Mb

,

Eb

)

, whereMb

=

(

Sb

,

Lb

, →

b

,

Ab

, λ

b

)

, such that Sb

=

V

×

V, i.e., states are pairs of control points

v

and stacks

σ

(also called configurations), Lb

= {

m1k m2

|

k

∈ {

call

,

ret

}

,

m1

,

m2

I+

} ∪ {

τ}

,

Ab

=

A,

λ

b

((v, σ )) = λ(v)

, and

b

Sb

×

Lb

×

Sbis defined by the rules:

(transfer)

(v, σ)

τ b

(v

, σ)

if m

I+,

v

ε m

v

′,

v |= ¬

r (call)

(v

1

, σ )

m1callm2

−−−−

b

(v

2

, v

1

·

σ )

if m1

,

m2

I+,

v

1 m2

m1

v

1,

v

1

|= ¬

r,

v

2

|=

m2,

v

2

E (return)

(v

2

, v

1

·

σ )

m2retm1

−−−−

b

(v

1

, σ)

if m1

,

m2

I+,

v

2

|=

m2

r,

v

1

|=

m1

The set of initial configurations is defined by Eb

=

E

× {

ϵ}

, where

ϵ

denotes the empty sequence over V .

Flow graph behaviour can alternatively be defined via pushdown automata (PDA) [5, Def. 34]. This can be exploited by using PDA model checking for verifying behavioural properties (see for instance [17,2,18]).

Example 9. Consider the flow graph fromExample 6. Because of possible unbounded recursion, it induces an infinite-state behaviour. One example execution of the program is represented by the following path (in the branching structure) from an initial to a final configuration:

(v

0

, ϵ)

τ

b

(v

1

, ϵ)

τ b

(v

2

, ϵ)

−−−−−−→

evencalloddb

(v

5

, v

3

)

τ b

(v

6

, v

3

)

τ b

(v

7

, v

3

)

oddcalleven

−−−−−−→

b

(v

0

, v

9

·

v

3

)

τ b

(v

1

, v

9

·

v

3

)

τ b

(v

4

, v

9

·

v

3

)

evenretodd

−−−−−−→

b

(v

9

, v

3

)

−−−−−−→

oddretevenb

(v

3

, ϵ).

Also on the behavioural level, we instantiate the definition of satisfaction: we defineG

|=

b

φ

as b

(

G

) |= φ

. The resulting behavioural logic is sufficiently powerful to express the class of security policies defined by finite state security automata (cf.

e.g., [4]).

Example 10. For the flow graph from Example 6, the behavioural formula

even ⇒

ν

X

.

[

even

call

even

]ff

[

τ

] X expresses the property ‘‘in every program execution starting in method

even

, the first call is not to method

even

itself’’. Clean flow graphs. Method graphs allow return points to have outgoing edges. However, the characterisation of behavioural properties by a set of structural formulae defined below is only correct if the flow graph has no such edges; such flow graphs are called clean. We define cleaning as a behaviour-preserving unary operation on method graphs, and lift it to flow graphs.

Definition 11 (Cleaning). Given a method graphMm

=

(

Vm

,

Lm

, →

m

,

Am

, λ

m

)

, the unary operation of cleaning is defined by:

(

Mm

)

=

(

Vm

,

Lm

, {

s α

mt

|

s

αmt

r

̸∈

λ

m

(

s

)},

Am

, λ

m

).

It is easy to see that cleaned flow graphs are clean. Cleaning is idempotent (

(

G•

)

=

G) and preserves behavioural

properties (G

|=

b

φ ⇔

G•

|=

b

φ

). Moreover, any node that is a return point trivially satisfies any structural box formula

(

(

G•

,

s

) |=

s r

⇒ ∀

α, χ. (

G•

,

s

) |=

s [

α

]

χ

). Below, we will use that the set of nodes that can be reached by a behaviour

(6)

Notational conventions. We use label

ε

for transfer edges in flow graph structures, and

τ

for silent behavioural transitions, while

ϵ

denotes the empty sequence.

In our translation of simulation logic formulae we allow sequences

ω

of labels to appear in box modalities, with the obvious translation

ˆ·

to standard formulae: [

ϵ

]

ψ = ψ

and[

α · ω

]

ψ =

[

α

] [

ω

]

ψ

, where

ψ

is already a standard formula. 3. Mapping behavioural into structural properties

This section defines a mappingΠfrom behavioural properties to sets of structural properties, both in simulation logic. As mentioned above, the implementation of the mapping can be tested on-line. Throughout the section we assume that flow graphs are clean, and that behavioural properties are disjunction-free; in Section5.4.1we discuss how Π can be extended to behavioural formulae with disjunction, though at the expense of completeness. We show thatΠ computes, from a behavioural property

φ

and closed interface I, a set of structural formulae that characterises

φ

and I. That is, for any closed flow graphGwith interface I and any behavioural formula

φ

that only mentions labels that are in the behaviour ofG

(i.e., LbinDefinition 8):

G

|=

b

φ ⇔ ∃χ ∈

ΠI

(φ).

G

|=

s

χ.

(1)

To deal with the fixed-point formulae of the logic, the mappingΠ is defined with the help of a tableau construction. A behavioural formula

φ

gives rise to a (maximal) tableau that induces a set of structural formulae through its leaves. The constructed tableau is finite, i.e., tableau construction terminates.

Our translation is based on a symbolic execution of the behavioural property by means of a tableau construction. When tracing a symbolic execution path, we tag all subformulae of the formula with unique propositional constants from a setC. We use a global, injective mapS

:

φ →

Cto map formulae to their tags. We considerSas an implicit parameter of the tableau construction (and where necessary, we also use its inverseS−1). The tableau construction operates on sequents of the shape

H,U,C

φ

parametrised on:

a non-empty history stack H

(

I+

×

(

I+

∪ {

ε} ∪

C

)

)

+, where each element is a pair consisting of the current method name and a sequence (called frame) of edge labels and propositional constants abbreviating subformulae of

φ

. For any frame F

(

I+

∪ {

ε} ∪

C

)

, we use

F to denote this frame cleaned from propositional constants2X

C:

ϵ = ϵ

m

·

σ =

m

·

σ

ε · σ = ε ·

σ

X

·

σ =

σ

a fixed-point stack U, defining an environment for propositional variables by means of a sequence of definitions of the shape X

=

ν

X

; an open formula

φ

in a sequent parametrised by U can then be understood via a suitable notion of substitution, based on the standard notion of substitution

ψ{θ/

X

}

of a formula

θ

for a propositional variable X in a formula

ψ

: the substitution of

φ

under U is inductively defined as follows:

φ[ϵ] = φ

φ[(

X

=

ν

X

.ψ) ·

U

] =

(φ{ν

X

.ψ/

X

}

)[

U

]

a store C , used for accumulating structural constraints during symbolic execution.3

We use ∅H,m, ∅U and ∅C to denote the single-element history stack

(

m

, ϵ)

and the empty fixed-point stack and store, respectively.

For a given closed behavioural formula

φ

and method m, we construct a maximal tableau with root

H,m,U,C

φ

that induces a set of structural formulae through its leaves, as described below. We denote the set of induced structural formulae for

φ

and m with

π

m

(φ)

. We then define the translation of

φ

w.r.t. a given interface I as all possible conjunctions of the induced structural formulae for each method that is provided by I:

ΠI

(φ) = {

mI+

χ

m

|

χ

m

π

m

(φ) }.

During tableau construction, the history stack, fixed-point stack and store are updated as follows, provided that the current sequent is not a repeat of an earlier sequent (see below):

(1) First, if

φ

is not a fixed-point formula, the propositional constantS

(φ)

tagging the behavioural property

φ

of the current sequent is appended to the end of the frame of the top element of H4;

(2) Next,

if

φ

is a conjunction, both conjuncts are explored in two separate branches;

2 We overload the symbols used for propositional variables for reasons that will become clear later, when defining induced structural formulae in Section3.3.

3 Using stores can in principle be dispensed with, but simplifies the presentation of the extraction of structural formulae and the correctness proofs. 4 Alternatively, instead of using propositional constants as tags, we could introduce fresh propositional variables, and add their defining equations to the fixed-point stack.

(7)

if

φ

is (the negation of) an atomic proposition, exploration terminates for this branch, and a set of structural constraints based on the atomic proposition and the current history stack are produced;

if the behavioural property

φ

prescribes an internal transfer, i.e., is of shape [

τ

]

φ

, then

ε

is appended to the end of

the frame of the top element of H, and we continue the symbolic execution with formula

φ

′;

if

φ

prescribes a call from a to b, i.e., is of shape [acallb]

φ

, and the top element of H is in method a, then b is added

at the end of the frame of the top element of H, a new element

(

b

, ϵ)

is pushed onto H, and we continue with formula

φ

;

if

φ

prescribes a return from a to b, i.e., is of shape [aretb]

φ

, the top element of H is in method a and the next element

is in method b, then a new structural constraint is added to the store, reflecting the possibility of currently not being at a return point, the top element is popped from H, and we continue with formula

φ

′;

if

φ

is a fixed-point formula

ν

X

, then a new equation X

=

ν

X

is pushed onto the fixed-point stack U, if not already there, this conditional addition being denoted by

(

X

=

ν

X

) ◦

U, and we continue with formula X ;

if

φ

is a propositional variable X for which there is an equation X

=

ν

X

in the fixed-point stack U, then we continue

with formula

φ

′.

The tags are used to signal repetition in the symbolic execution of the formula; they ensure termination of the tableau construction. The structural constraints and the elements in the call stack denote conditions under which the property holds. Each step of the symbolic execution essentially adds new constraints.

Example 12. To illustrate symbolic execution informally, consider the behavioural property ‘‘invocation of a method cannot return without making a method call’’, which can be formalised as:

ν

X

r

[

τ

] X . When executing this formula symbolically for method a we perform the following steps:

(1) We start with initial history stack

(

a

, ϵ)

and formula

ν

X

r

[

τ

] X .

(2) The equation X

=

ν

X

r

[

τ

] X is pushed onto the fixed-point stack, and we proceed with formula X . (3) The definition of X is retrieved from the fixed-point stack, and we continue with formula

¬

r

[

τ

] X . (4) We have a conjunction, so each conjunct is explored separately.

(5) The first conjunct

¬

r is the negation of an atomic proposition, and therefore exploration terminates for this branch,

producing a constraint that essentially requires

¬

r to hold.

(6) The second conjunct [

τ

] X appends

ε

to the frame, i.e., the history stack becomes

(

a

, ε)

, and we proceed with formula

X .

(7) This is recognised as a repeat of a situation that arose before (at step 3), therefore the exploration terminates, producing a constraint that essentially requires X to hold for all nodes that can be reached by passing a transfer edge.

The two constraints produced by this symbolic execution are combined to obtain the recursive structural formula a

ν

X

r

[

ε

] X . Below, on page77the complete tableau for this formula is presented, together with several other example tableaux. Notice that non-emptiness of the history stack and closedness of

φ[

U

]

are invariants of the tableau construction. Using the translation

δ

of formulae interpreted over weak transitions into equivalent formulae interpreted over strong transitions discussed in Section2.1, the property translation Π can be applied to reduce properties of the observable behaviour of flow graphs into corresponding sets of structural properties.

3.1. Tableau system

The tableau system is given inFig. 2as a set of goal-directed rules. Axioms are presented as rules with an empty set of premises denoted by ‘

’. The conditionRet

(

i

,

a

,

b

,

H

)

used in the return rules is defined as i

=

a

H

̸=

ϵ ∧ ∃

F

,

H

.

H

=

(

b

,

F

) ·

H, i.e., there is a pending call from method b on the top of the history stack.

Formally, a tableauT

=

(

T

, λ)

is a tree T equipped with a labelling function

λ

mapping each tree node to a triple consisting of a sequent, a rule name (namely the rule applied to this sequent), and a set of triples of shape

(

i

,

F

,

q

)

where q are literals (that is, atomic propositions in positive or negated form or propositional variables). The triple sets are non-empty only at applications of axiom rules; such leaves are termed contributing, and the set of triples is depicted (by convention) as a premise to the rule. A tableau for formula

φ

and method m is a tree with root

H,m,U,C

φ

obtained by applying the rules. A tableau is termed maximal if all its leaves are axioms.

If in a tableau there is a leaf node

(i,FH,U,C

φ

for which there is an internal node

(i,F ′H′,U′,C ′

φ

such that Fis a prefix

of F , Uis a suffix of U, and Cis a subset of C , we term the former node a pseudo-repeat; any node of the latter kind we term a companion. An internal tableau node is said to be stable if all its descendant leaves are axioms or pseudo-repeats. A tableau is called stable if its root node is stable.

Tableau construction proceeds as follows. First, a minimal stable tableau is computed, i.e., if a node is a pseudo-repeat,

it is not further explored. If all pseudo-repeats in this tableau satisfy some repeat condition for any of their companions (see below), the tableau is maximal and construction is complete. Otherwise, all pseudo-repeats that are not satisfying any of the repeat conditions are simultaneously unfolded, using a breadth-first exploration strategy, and tableau construction continues until the tableau is stable again, upon which the checking for the repeat conditions is repeated. As discussed below, in Section3.5, this process is guaranteed to terminate, resulting in a finite maximal tableau.

(8)

p{(i,F,p)}∪{(i′(i,,FF ′,Hff,U)|(,Ci′p,F ′)∈H}∪C

¬

p{(i,Fp)}∪{(⊢(i′i,F,F ′H,,ffU)|(¬i′p,F ′)∈H}∪C

ν

X ⊢(i,FH,U,CνX.φ (i,FH,(XX.φ)◦U,CX X unf(i,FH,U,CX(i,F·S(X))·H,U,Cφ

(

X

=

ν

X

.φ) ∈

U

⊢(i,FH,U,Cφ1∧φ2 ⊢(i,F·S(φ1∧φ2))·H,U,Cφ1 ⊢(i,F·S(φ1∧φ2))·H,U,Cφ2

τ

(i,FH,U,C[τ]φ ⊢(i,F·S([τ]φ)·ε)·H,U,Cφ call0⊢(i,FH,U,C[a call b]φ − i

̸=

a call1 ⊢(i,FH,U,C[a call b]φ ⊢(b,ϵ)·(i,F·S([a call b]φ)·bH,U,Cφi

=

a ret0⊢(i,FH,U,C[a ret b]φ −

¬

Ret

(

i

,

a

,

b

,

H

)

ret1 ⊢(i,FH,U,C[a ret b]φ ⊢H,U,C∪{(i,Fr)}φ Ret

(

i

,

a

,

b

,

H

)

IRep ⊢(i,FH,U,Cφ {(i,F,S(φ))}∪CIntRep

(

S

(φ), (

i

,

F

) ·

H

)

CRep⊢(i,FH,U,Cφ − CallRep

(

S

(φ), (

i

,

F

) ·

H

,

c

)

RRep⊢(i,FH,U,Cφ − RetRep

(

S

(φ), (

i

,

F

) ·

H

,

c

)

Fig. 2. Tableau system.

3.2. Repeat conditions

We now formulate the three repeat conditions used in the tableau system, giving rise to three types of repeat nodes. Only repeats of the first type, i.e., internal repeats, contribute to triples, giving rise to recursion in structural formulae. In contrast, the other two repeat conditions only recognise that a similar situation has been reached before, and thus no new constraints will be obtained by further exploration. The first repeat condition requires merely the examination of the top frame of the history stack of the current sequent; the second one requires the examination of the whole path from the root to the pseudo-repeat; while the third one requires the examination of all remaining paths. Section3.4illustrates the use of these repeat conditions on several examples.

Internal repeat. Tableau construction guarantees that every tableau node of shape

H′·(i,F ′·S(φ)·F ′′H′′,U,C

φ

possesses an ancestor node

(i,F ′H′′,U′,C ′

φ

such that Uis a suffix of U and Cis a subset of C . As a consequence, every node of

shape

(i,F ′·S(φ)·F ′′H,U,C

φ

is a pseudo-repeat (with some ancestor node of shape

(i,F ′H,U′,C ′

φ

as companion); such pseudo-repeats are termed internal repeats. Intuitively, an internal repeat indicates that a regularity in the structure of method i has been discovered, and thus this regularity should be reflected in the structural formulae. Therefore, in this case

(

i

,

F

·

S

(φ) ·

F′′

,

S

(φ))

is added to the triple set of the IRep axiom.Example 14illustrates the use of the internal repeat

condition.

Call repeat. A pseudo-repeat

(i,FH,U,C

φ

, which has an ancestor node as companion but is not an internal repeat, is a call

repeat if H matches the call stack of the companion up to the latter’s return depth (where matching means that the same

methods are on the stack, with identical frames); in the special case where both stacks are shorter than the return depth, they have to be identical.

The return depth of a companion node is only taken into account if its sub-tableau is complete, i.e., the corresponding pseudo-repeat is its only open branch. There is one exception: when we construct a tableau for a formula with multiple fixed-points, it can happen that two or more pseudo-repeats occur in the sub-tableaux of each other’s companions. If all pseudo-repeats are call repeats for the current return depths of the companions, exploration terminates; otherwise, by virtue of the tableau construction, we know that the pseudo-repeats that are not call repeats, will never become such when continuing the tableau construction. Therefore, we can explore these nodes further until we break the mutual dependency. The return depth of a tableau node n, denoted as

ρ(

n

)

, is defined as the maximal difference between the number of applied return rules and the number of applied call rules on any path from n to a descendant node. Formally:

ρ

(ϵ) =

0

ρ

(

r

·

δ) =

ρ

(δ) +

1 if r

∈ {

ret 0

,

ret1

}

ρ

(δ) −

1 if r

∈ {

call 0

,

call1

}

ρ

(δ)

otherwise

ρ(

n

) =

max

{

ρ

(

rules

(π)) | π

a path from n to a descendant node

} ∪ {

0

}

where r and

δ

range over rule names and sequences of rule names, respectively, while rules

(π)

denotes the sequence of rule names along a tableau path

π

.Example 16illustrates the use of the call repeat condition and why it is necessary to consider the return depth.

Return repeat. A pseudo-repeat is called a return repeat if it has a companion on a different path from the root, such that its history stack is identical to the one of the companion.Example 17contains a return repeat (in combination with an internal and a call repeat).

(9)

Formally, the repeat conditions are defined as follows, where X isS

(φ)

, c is the companion node of the pseudo-repeat, and Hcis the history stack at c.

IntRep

(

X

, (

i

,

F

) ·

H

) ⇔

X

F

CallRep

(

X

, (

i

,

F

) ·

H

,

c

) ⇔

X

̸∈

F

take

(ρ(

c

) +

1

, (

i

,

F

) ·

H

) =

take

(ρ(

c

) +

1

,

Hc

)

RetRep

(

X

, (

i

,

F

) ·

H

,

c

) ⇔ (

i

,

F

) ·

H

=

Hc

.

3.3. Structural formulae induced by a tableau

A maximal tableau for

φ

and m induces, through the sets of triples accumulated in the leaves, a set of structural formulae

π

m

(φ)

in the following manner:

(1) LetLbe the set of non-empty triple sets collected from the leaves of the tableau. Build a collection of choice setsΛ

(

L

)

, by choosing one triple from each element inL.

(2) For each choice set

λ ∈

Λ

(

L

)

,

(a) Group the triples of

λ

according to method names: for each i

I+, define

Ξi

= {

(

F

,

q

) | (

i

,

F

,

q

) ∈ λ}.

(b) For each i

I+such thatΞi

̸=

∅, build a formula i

(

Ξi

)

, where

(

Ξ

) = 

φ∈Ω′(Ξ)

φ

Ω′

(

Ξ

) = {

[a]

(

Ξ

) |

a

I+

Ξ

= {

(

F

,

q

) | (

a

·

F

,

q

) ∈

Ξ

} ∧

Ξ

̸=

}∪

{

ν

X

.

(

Ξ′

) |

X

C

Ξ

= {

(

F

,

q

) | (

X

·

F

,

q

) ∈

Ξ

} ∧

Ξ

̸=

}∪

{

q

|

(ϵ,

q

) ∈

Ξ

}

.

(c) The induced formula

χ

for

λ

is the conjunction of the formulae obtained in the previous step. (3) The set

π

m

(φ)

of induced formulae is the set of induced formulae for

λ ∈

Λ

(

L

)

.

For example, the choice set

λ = {(

a

,

X

·

b

, ¬

r

), (

a

,

X

·

b

,

X

)}

induces (by step 2) the structural formula a

ν

X

.

[b]

r

X

)

. Notice that all induced formulae are closed and guarded whenever the original behavioural one is. Notice further that in the second line of the definition ofΩ′

(

Ξ

)

, the propositional constants X

Care actually replaced by corresponding fresh

propositional variables.

Computing this set of induced formulae is exponential. Therefore, our current implementation performs some ad hoc simplifications, but more optimisations are possible. For instance, since logically subsumed formulae are redundant in the characterisation, the construction of choice sets can be optimised as follows: if a triple is picked from a contributing leaf, then the same triple must be selected from all other contributing leaves containing it.

3.4. Examples

To illustrate the tableau construction, we discuss several examples. First we show the tableau that corresponds to the symbolic execution that is informally described inExample 12. Then we illustrate how an internal repeat gives rise to a structural formula with fixed point. The next example shows why the repeat condition has to be tested on all subformulae, and not only on fixed points. Then the use of the return depth for the call repeat is illustrated by the following example. Finally, the last example illustrates the use of all three different repeat conditions.

Example 13.Fig. 4shows the tableau for the formula

ν

X

r

[

τ

] X used inExample 12to illustrate symbolic execution informally.

Example 14. Consider the following behavioural formula:

φ = ν

X

.

[acallb] X

[breta]

r

X

)

.Fig. 3shows the mapping

Sfrom the subformulae of

φ

to propositional constants, and the tableau that is constructed for this formula and method a. The first node where a triple is produced is the one labelled ret1; the triple is then propagated to the two leaves that result from application of the rule for atomic propositions, and simple repeat, respectively.

The tableau has two leaves with non-empty triple sets;Lthus consists of two sets of two triples each. Thus, to construct the set of structural formulae, we compute structural formulae for the four choice sets resulting fromL:

{

(

a

,

X1

·

X2

·

X3

·

b

·

X5

, ¬

r

), (

a

,

X1

·

X2

·

X3

·

b

·

X5

,

X1

)}

{

(

a

,

X1

·

X2

·

X3

·

b

·

X5

, ¬

r

), (

b

,

X1

·

X2

, ¬

r

)}

{

(

b

,

X1

·

X2

, ¬

r

), (

a

,

X1

·

X2

·

X3

·

b

·

X5

,

X1

)}

{

(

b

,

X1

·

X2

, ¬

r

)}.

The first set yields the structural formula a

ν

X1

X2

X3

.

[b]

ν

X5

.(¬

r

X1

)

, which simplifies to

χ

1

=

a

ν

X

.[

b

]

r

X

)

. The last set gives rise to the formula (after simplification)

χ

2

=

b

⇒ ¬

r. The formulae constructed from the second and third set are subsumed by (i.e. imply)

χ

2, and hence

π

a

(φ) = {χ

1

, χ

2

}

. For

φ

and method b there is a single tableau, which has no leaf triples, and hence

π

b

(φ) = {

tt

}

. Thus,Π

(φ) = {χ

1

, χ

2

}

, i.e., the behavioural formula

φ

is satisfied by every flow graph for which either no return node in method a is reachable via b-edges only, or method b’s entry node is not a return node.

(10)

X0

ν

X

.[

a call b

]

X

[

b ret a

](

¬

r

X

)

X4

[

b ret a

](

¬

r

X

)

X1 X X5

¬

r

X X2

[

a call b

]

X

[

b ret a

](

¬

r

X

)

X6

¬

r X3

[

a call b

]

X ⊢( a ,ϵ ), ∅ U ,∅C ν X .[ a call b ] X ∧ [ b ret a ]( ¬ rX ) ν X * ⊢( a ,ϵ ), X = φ, ∅ C X X unf ⊢( a , X1 ), X = φ, ∅C [ a call b ] X ∧ [ b ret a ]( ¬ rX ) ∧ ⊢( a , X1 · X2 ), X = φ, ∅ C [ a call b ] X call 1 ⊢( b ,ϵ )· ( a , X1 · X2 · X3 · b ), X = φ, ∅ C X X unf ⊢( b , X1 )· ( a , X1 · X2 · X3 · b ), X = φ, ∅ C [ a call b ] X ∧ [ b ret a ]( ¬ rX ) ∧ ⊢( b , X1 · X2 )· ( a , X1 · X2 · X3 · b ), X = φ, ∅C [ a call b ] X call 0 − ⊢( b , X1 · X2 )· ( a , X1 · X2 · X3 · b ), X = φ, ∅ C [ b ret a ]( ¬ rX ) ret 1 ⊢( a , X1 · X2 · X3 · b ), X = φ, {( b , X1 · X2 ,¬ r )} ¬ rX ∧ ⊢( a , X1 · X2 · X3 · b · X5 ), X = φ, {( b , X1 · X2 ,¬ r )} ¬ r ¬ r ( a , X1 · X2 · X3 · b · X5 , ¬ r ) ( b , X1 · X2 , ¬ r ) ⊢( a , X1 · X2 · X3 · b · X5 ), X = φ, {( b , X1 · X2 ,¬ r )} X IRep (∗ ) ( a , X1 · X2 · X3 · b · X5 , X1 ) ( b , X1 · X2 , ¬ r ) ⊢( a , X1 · X2 ), X = φ, ∅ C [ b ret a ]( ¬ rX ) ret 0 − Fig. 3. Tableau for φ = ν X .[ a call b ] X ∧ [ b ret a ]( ¬ rX ) and a ,giving rise to { a ⇒ ν X .[ b ]( ¬ rX ), b ⇒ ¬ r } .

(11)

X0

ν

X

r

[

τ

] X X3

¬

r X1 X X4 [

τ

] X X2

¬

r

[

τ

] X(a,ϵ), U,∅C νXr∧[τ] X νX *⊢(a,ϵ),X=φ,∅C X X unf(a,X1),X=φ,∅C¬r∧[τ] X ∧ ⊢(a,XX2),X=φ,∅C¬r ¬r (a,XX2, ¬r) ⊢(a,XX2),X=φ,∅C] X τ(a,XXX4·ε),X=φ,∅CX IRep(∗) (a,XXX4·ε,X1)

Fig. 4. Tableau forφ = νXr∧[τ] X and a, giving rise to{a⇒νX.[ε]X∧ ¬r}.

Example 15. Consider the following behavioural formula:

φ =

[acallb]

X

.

r

[breta]

r

[acallb] X

))

.Fig. 5shows the tableau that is constructed for this formula and method a, giving rise to the singleton set of structural formulae

{

(

a

[b]

X

r

[b] X

)) ∧ (

b

r

)}

.

When symbolically executing the formula, the fixed point is unfolded in method b. However, the subsequent return from b to a removes the frame related to this call to b, and thus destroys the tag that the fixed point was unfolded. Still, because we tag occurrences of all subformulae, the tableau construction recognises the repeat after returning from b to a. Example 16. Consider the following behavioural formula:

φ = ν

X

.

[acallb]ff

[acalla] X

[areta] X . Fig. 6shows the tableau that is constructed for this formula and method a, giving rise to the following set of structural formulae:

{

a

ν

X

.

[b]ff

[a] X

,

a

[b]ff

∧¬

r

}

.

During tableau construction, we find that node

(a,ϵ)·(a,X1·XXXa),X=φ,∅C X is a pseudo-repeat with

(a,ϵ),X=φ,∅C X as

companion candidate. However, because of the application of ret0in the subtree of the companion candidate, the return depth is 1, and thus this pseudo-repeat is not a repeat. After unfolding the fixed point once more, tableau construction terminates. Notice that if the earlier pseudo-repeat had been a repeat, the triple

(

a

,

X1

·

X2

·

X5

, ¬

r

)

(and therewith the structural formula a

[b]ff

∧¬

r) would not have been found.

Example 17. Finally, the last example concerns a behavioural formula with a return loop within a call loop:

φ =

ν

X

.(ν

Y

.

[areta]

r

Y

)) ∧

[

τ

] X

[acalla] X . For termination of the tableau, all three different repeat conditions are

necessary.Fig. 7and further shows the tableau that is constructed for this formula and method a, giving rise to the following set of structural formulae:

{

a

ν

X

r

[

ε

] X

,

a

ν

X

.

[

ε

] X

[a]

¬

r

}

.

3.5. Termination

The repeat conditions ensure termination of tableau construction. Theorem 18. Maximal tableaux are finite.

Proof. Let behavioural formula

φ

and method name m

I+be given. First, observe that repeat conditionIntRepputs a

bound on the length of frames in stacks, since: (i)

φ

has a finite set of sub-formulae and thus a finite number of propositional constants can occur in frames, (ii) each propositional constant can occur at most once in a frame, and (iii) every method name (or

ϵ)

in a frame is preceded by some propositional constant. As a consequence, since

φ

can only mention finitely many method names, the set of possible stores is also finite, and so is the set of fixed-point stacks.

Further, we shall show below that repeat conditionRetRepputs a bound on the return depth of nodes in any tableau for

φ

and m. Hence, since there is only a finite number of method names and frames that can occur in history stacks, functiontake

in the definition ofCallReptakes values from a finite set. As a consequence (since formula

φ

has a finite set of sub-formulae and only mentions finitely many method names), there is a bound on the length of paths from the tableau root without reaching a node satisfying one of the repeat conditions: every path eventually reaches (i) an axiom, (ii) an internal repeat, (iii) a return repeat, or else (iv) a pseudo-repeat satisfying repeat conditionCallRep.

Boundedness of the return depth is established as follows. Assume on the contrary that no such bound exists. As a consequence, there is no bound on the length of paths having return depth 1 (i.e., where the number of applied return and call rules differs by 1, defined as

ρ

above) either. That is, for each natural number k there is, in some tableau for

φ

and m, a tableau node and a path from this node having length k and return depth 1. Notice that every path of the tableau construction can be viewed as an execution of a pushdown automaton where the control state is defined by the formula, fixed point stack and store of the current sequent (thus from a finite domain), and the stack is defined by the history stack. It can be shown, by referring to the Pumping lemma for context-free languages but viewed from a pushdown automata perspective

Referenties

GERELATEERDE DOCUMENTEN

ment van Economische Zaken was zij ge- plaatst op de Afdeling Landbouw-Crisis- Aangelegenheden en in de loop van de eerste maanden van 1940 zou zij ‘geruisloos’ over- gaan naar

Met andere woorden, voor alle bovenstaande activiteiten (uitgezonderd die onderdelen die niet toegerekend kunnen worden aan de infrastructuur) wordt de optelsom gemaakt

After applied research results in new technology, the technology moves to business units, where investments are needed for technology development (see figure 4)..

For higher speeds, centrifugal forces have to be considered or the walls moved (which is possible within Mercury-DPM). Figure 1 shows an example of one of these

Eerder onderzoek (Kroesbergen & Van Luit, 2003; Bryant et al., 2008) laat zien dat oudere kinderen meer vooruitgaan dan jongere kinderen bij dit type ondersteuning. Dit

Dynamic indices, including pulse pressure, systolic pressure, and stroke volume variation (PPV, SPV, and SVV), are accurate predictors of fluid responsiveness under strict

antiparallelograms instead of two kites as in the case of Kempe's cell. In a way, we have transformed the kites into antiparallelo- grams. Like with Kempe's cell, a

The following tasks were suggested: to survey and to analyse the experience of the different Member States, to promote research and to disseminate the knowledge