A tutorial for data modeling

(1)

A tutorial for data modeling

Citation for published version (APA):

Aerts, A. T. M., & Hee, van, K. M. (1988). A tutorial for data modeling. (Computing science notes; Vol. 8809). Technische Universiteit Eindhoven.

Document status and date: Published: 01/01/1988

Document Version:

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.

• The final author version and the galley proof are versions of the publication after peer review.

• The final published version features the final layout of the paper including the volume, issue and page numbers.

Link to publication

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:

www.tue.nl/taverne Take down policy

If you believe that this document breaches copyright please contact us at: openaccess@tue.nl

(2)

A Tutorial for Data Modeling

by

A.T.M. Aerts and K.M. van Hee 88/09

(3)

COMPUTING SCIENCE NOTES

This is a series of notes of the Computing Science Section of the Department of Mathematics and Computing Science of Eindhoven University of Technol-ogy.

Since many of these notes are preliminary versions or may be pUl:lli_s?~d

else-where. they have a limited disiribution only and are not for review.

Copies of these notes are available from the author or the editor.

Eindhoven University of Technology

Department of Mathematics and Computing Science

P.O. Box 513

(4)

A Tutorial for Data Modeling A.T.M. Aerts and K.M. van Hee

(5)

CONTENTS 1. Introduction 2. Database Scheme 3. State Space

4. Data Definition Language and Query Language 5. Diagram Techniques and Constraints

6. Representations 7. An Example . 8. Design Method

9. Standard Constructions 10. Comments and Conclusions

Table 1 References pg. I 2 3 4 10 15 17

20

23

28

31 32

(6)

1. Introduction.

In this paper we will construct a formal framework for defining functional data models. A data model is a representation of the structure of the state of the system under

description: the object system. The object system is that part of the real world that we

are interested in and want to describe. The data model gives the structure of the data

that will be stored and manipulated in the database of the object system. A functional data model distinguishes itself from other data models by the fact that the mathematical notion of function plays a central role.

Functional data models have been studied before in the literature, most notably by

David Shipman [Sm81]. In his 1981 paper Shipman builds his model around the basic

concepts of entities (the "things that exist" in the object system) and functions (relations between entities). The essential difference between Shipman's data model and ours is that Shipman's functions are multivalued in the sense that when applied to an entity out of its domain a function always will yield a set of entities. In our case sin!.(le entities will be returned. Another difference is that Shipman also treats datatypes such as strings and integers, needed for representing names and numbers, as entities. In our model such objects do not appear at the data model level (except perhaps in some very special cases such as the modeling of a programming language), but only occur when we discuss representational issues.

The construction of the datamodel will be split up in several steps. The first step in the construction of a data model is to establish the boundaries of the object system (also called the UoD : the Universe of Discourse) and decide what is included in the system and what is not. In the course of this step we make an inventory of the objects that occur in the object system and the functional relations between the various objects. Objects are organised into sets, called cate!.(ories. Objects having similar properties will belong to the same category. Within a category every object is unique. Functional relations between objects are organised into functions, called properties, between the categories the objects belong to. The information gathered about the objects in the UoD is laid down in the database scheme (see section 2).

In section 3 we present the second step: the construction of the free universe of the database as the set of all possible states of the object system. Not all of these states

will actually be allowed to occur in the UoD and the state space will have to be

restricted. In order to be able to do so we will in step 3 (section 4) define a data language together with an interpretation for it that will allow the formulation of

constraints on the database and queries. The data language has the expressive power of the relational algebra or calculus. Numerical computations are not included. In

applications one should use the language in embedded form, i.e. as a sublanguage of a general purpose programming language. In step 4 (section 5) we will present a diagram technique to give a graphical representation of a database scheme and of standard

constraints. In section 6 we will elaborate on representational issues. In section 7 we will apply the model to a specific case and illustrate various aspects of it. Then, in section 8,

we present a design method which leads to a functional data model. In section 9, we

(7)

discuss some standard constructions. Finally, in section 10, we discuss the relation of the data model presented here to other data models and make some concluding remarks.

' j

2. Database Scheme.

In this section we present the fIrst step in the construction of a formal framework for defIning data models: the structure of a database scheme. At this level we will be concerned only with the logical structure of the database, not with what the database will store at a given point in time. In other words, treating the database as a variable, we will discuss the datatype of this variable, not its value at a given time. We will therefore carry out the discussion in the fIrst steps of the construction in terms of the names of the categories and functions. To avoid confusion, the functional relations between categories will be referred to as properties.

DefInition 1. Functional Database Scheme

A functional database scheme is a 5-tuple < C, P, D, R, V

>,

where

- C is a fInite set; - P is a fInite set; PnC=<j>; - DE P

-->

C;

- REP --> C;

- V is a set-valued function; dom(V)=C

III

An element c E C is called a category; an element PEP is called a property. D is called the domain category function; D(p) is the domain category of property p. R is the range category function; R(p) is the range category of property p. V is called the domain function. For c E C, V(c) is called the domain of category c. It contains the

representations of all objects that can possibly be a member of category c. Every object is unique.

This requirement is formally expressed as:

At this point, we will not worry about the way the objects belonging to a category c will be represented. One could use integers or tupels or whatever for representing these objects so long as the representation of each object is unique.

A functional database scheme [see, e.g., BAC69] may be represented by a semantic network, i.e, a directed graph with labeled edges and nodes. For every category c E C there is exactly one node in the graph with label c. For every pair of nodes in the graph for categories c and c', such that there is a property p with D(p )=c and R(p )=c', there is an edge in the graph directed from node c to node c'. This edge is labeled with property p. All labels are unique, since PnC=<j>.

In a database scheme we express two things:

the classifIcation of the objects of the Universe of Discourse into categories. the functional relationships of objects in one category to objects from other categories (or the same category), indicated by the properties.

(8)

As a simple example of a functional database scheme represented by a directed graph we give the semantic network of a database for registering exams. Once the courses, that will be offered in a given year are known, the dates at which one can take an exam for these courses can be planned. An exam is in this situation regarded as a combination of

, a date and a course. The set of exams thus gives the dates on which an exam for a

, 'given course can be taken. Nodes are represented by boxes, edges by arcs .

.

enQ.W\~

_'18.0.'"

... i ...

_'1

COu"'fe Q.,l(O. ... do.~

..

_o~l"'l

,fo ...

0'"

1"1'\

d

d.<>-1

Fig. 1

We see from the graph that courses also have a name and dates are being specified in terms of a year, a month and a day, by means of their properties y, m and d.

We will distinguish several typ,;s of categories. Similar distinctions have been made in the Entity-Relationship Model of Chen [CHE76] which has found a widespread usage (see, e.g., [CHE80], [CHE83] and [DAV83]; for an overview of data models, see [TSI82]). Since the ideas are rather similar we will use the same terminology, although the reader should be aware of the fact that the underlying mathematical notions are different. The first type is defined on the basis of scheme properties:

Definition 2. Attributes

Let F

=

<

C, P, D, R, V

>

be a functional database scheme. A category c E C is called

an attribute category if and only if ApE P : c 1= D(P)

II!

Hence attribute categories label nodes without outgoing edges. Therefore, these categories don't have properties of their own. Their role is to give further detail of the categories they are associated with through a property.

In the simple example above, the categories "year", "month", "day" and "cname" are attribute c a t e g o r i e s . '

-3. State Space.

The second step in the construction is the definition of the free state space, also called the free universe of a database. A state will be defined as a set valued function with C U P as domain.

(9)

Definition 3. Free State Space

Let.

<G"J.iI"'1)Ut ,

V ;:. be a fIilR~@ilal database scheme. The free state space is the set

of functions

sr

with domain CUP such that for s E Sf:

i) c E C: s(c).Q V(c) and s(c) is finite ii) PEP: s(P) E s(D(p» -*-> s(R(P»

III

Here A -*-> B stands for the set of all partial functions from A into B.

Por a given state s of the database and aCE C, s( c) will be called the state of category c. Similarly, for apE P, s(p) will be called the state of property p.

Requirement i) then says that in each state only objects from the domain may belong to the state of the category. Furthermore, the number of objects in any state of category c is finite. ii) says that the state of a property is a partial function from the domain category to the range category. We see that

dom(s(p».Q s(D(p».Q V(D(p» a.~d mg(s(p».Q s(R(p».Q V(R(p»,

4. Data Definition and Query Language

The free state space is in general too large, i.e. it contains states that will or may not occur in the Universe of Discourse. In order to formulate restrictions on the state space and to express views and queries, we define, in the third step of our construction, a first order language. Note that each database has its own language, which differs from the language of other databases only in the constants of the language. Our definition proceeds in the standard way [LEW8I, LL084].

Before we give the definition of the data language we introduce for later convenience some notation. The inverse of a function f with domain A and range B is defined by means of the inverse function application symbol "1\" :

AbE B : fl\b = { a E A I (a; b) E

fl.

fl\b is also referred to as the cOII1plete original of b under f. Given the notions of function and inverse function application, we need to specify the application of a given

function to a subset of its domain in order to be able to use function composition. We

define:

- f D := { e EEl Ed ED: (d; e) E f }, - fl\E := U { fAe leE E }

for f E A-->B and D.Q A and E.Q B. The composition gOf of functions f E A-->B and g E B-->C is defined as :

gOf= {(a; c) I aE AandcE Cand(EbE B: (a; b) E fand (b; c) E g)}

We also introduce for a set of sets W the generalised (or distributed) union U, which we define as:

UW

= {

x I E A E W: x E A}.

UW then is the set containing all elements that occur in at least one of the elements of

W. With this definition, we write

m

= U{ V(c) ICE C

}.m

contains the elements of all

domains. We assume that each separate domain has a total ordering. This is the only assumption we make for the domains.

Definition 4. Data Language

Let P = < C, P, D, R, V > be a funcitonal database scheme. Data language Lp then consists of the following elements:

(10)

i) Alphabet

_ _ _ t istke\:U!i1;ln of the following sets of symbols constants: ID U CUP

variables: { X, Y, Z, Xl' Y l' ZI' ... } function symbols:

q\.,

A, dom, mg, I } set symbols: { U,

n, \

+ }

atom comparison symbols: {

=,1=,

<, >,

$, ~ }

set comparison symbols: { ,,;.,

t.,

=,

1= }

function comparison symbols: { ,,;.,

t.,

=,1= }

atom-set symbols: { E, 1= }

logical symbols: { and, or, -', implies, iff} quantors : { A, E,

$ }

interpunction symbols: { [, ], (, ), ~, : }

(yVe note that symbols such as

=

and,,;. are being overloaded: they can be used in more

than one context. The symbol

r

will be interpreted as the restriction operator; the symbol

+ will get the meaning of symmetric difference operator.)

ii) Terms

every a E ID is an a-term every c E C is an s-term

every pEP is an f-t~rm

every variable is an a-term a-, s- and f-terms are terms

if f and g are f-terms, then f' g is an f-term

if f is an f-term and x is an a-term, then f.x is an a-term and fAx is an s-term

if f is an f-term, then dom(f) and mg(f) are s-terms if f is an f-term and sans-term, then irs is an f-term

if f is an f-term and sans-term, then f.s and fAs are s-terms

if sl and s2 are s-terms and e is a set symbol then sl eS2 is an s-term

if X is a variable, c an element of C and q a predicate, then $[ X : c I q ] is an s-term

there are no other terms

iii) Predicates

if al and a2 are a-terms and e is an atom comparison symbol then al ea2 is a

predicate . "

if a is an a-term, sans-term and e a atom-set symbol, then aSs is a

predicate

if sl and s2 are s-terms and S a set comparison symbol, then sl SS2 is a predicate

if f 1 and f2 are f-terms and e a function comparison symbol, then f 1 Sf2 is a predicate

if q 1 and q2 are predicates and S is a logical symbol, not equal to -', then (qi eq2) is a predicate

if q is a predicate, then -,q is a predicate

if X is a variable, q a predicate, c an element of C and Q E { A,

E },

then

Q[ X : c I q ] is a predicate there are no other predicates

(11)

iv) Language

the language LF is the set of all predicates

As an example of an element of Lp. consider the following statement, based on the database scheme of Fig. 1:

"Every exam has a course and a date associated with it". This statement is expressed in LF as:

A [e: exam I E [c: course I for.e

=

c] and

E [ d: date, lon.e

=

c] ]

Given the alphabet the terms and predicates are being defined inductively. First, atomic terms are constructed from the constants and variables of the alphabet (for example: for, e and c are atomic terms). Next, already available terms can be combined with suitable symbols to form more complicated terms, such as for.e. From such terms one can with

the help of appropriate symbols also construct atomic predicates (for.e

=

c), which in

turn can be combined with logical symbols (such as and) to form more complicated predicates. Finally, terms and predicates can be mixed using quantors and variables to yield new terms or predicates, depending on the quantors being used. In the Lp-expression above, an example of this is given by:

E [c: course I for.e = c}.

The expressions obtained this way may become quite lengthy. To make the language a little friendlier to the user, one often "sugars" the language a bit by allowing for

abbreviation mechanisms such as "where" clauses and argument lists that enable one to break up long definitions into smaller pieces. Our language has some redundance, for example

t=

could be omitted since we also have --, and

=.

A quantor binds a variable to an s-term, thereby specifying which a-terms one can

substitute for that variable. The range of validity of this binding, called the ~ of the

quantor, is specified by a pair of square brackets [ and], which are called the scope brackets, the first of which, the opening bracket, is the first symbol appearing after the

quantor symbol. The occurrence of a variable is bound by a quantor if and only if it

appears in the scope of that quantor and one of the following conditions is satisfied: the occurrence follows the opening bracket ([) immediately or there is another

occurrence of the same variable that follows the opening bracket immediately. When a variable is not bound by any quantor, we speak of a free occurrence of that variable. More loosely, the variables are dubbed bound and free respectively. The s-term that a variable X is bound to is called the domain of that variable.

As an example consider the following predicate A [ x: a I E [ y: b I x E $ [ x: c I f.x

=

g.y

m.

In this predicate, the first and sc:;ond occurrence of the variable x are bound by the A-quantor; they have domain a. Although the third and fourth occurrence of x also fall within the scope of this A-quantor (they appear within the outermost pair of square brackets), they fall within the scope of the $-quantor (since they appear within the innermost pair of square brackets) as well and are bound by the latter quantor, since the third occurrence of x immediately follows the opening bracket of this quantor. The y-occurrences, with domain b, are bound by the E-quantor. The use of variables, as

displayed in the predicate above, is not recommended. One should use different symbols for different variables as much as possible.

(12)

Note that the language defined above is generated by a context free grammar. The symbols in the alphabet are the tenninal symbols in the alphabet of the grammar. The terms and predicates correspond to the nontenninal symbols and parts ii) and iii) of definition 4 correspond to the production rules.

An interpretation of the first order language defined above is constructed according to

the standard way (cf. [LEW81, LLOP4]). Our interpretation function I will depend on the

state of the database, given by s E S . This dependence will be indicated by subscripting

I with s: Is'

Since predicates may contain, in principle, both bound and free variables, the truth value of a predicate can only be evaluated, when we have assigned to each variable in the language a constant from ID. (In practice, it suffices to assign a constant to every variable occuring in the subset of the language that is actually used.) Let A be an assignment function. such that A assigns constants x to X and y to Y A predicate p

which for example depends on Variables X and Y will then be interpreted under

assignment function A as depending on constants x and y.

It is clear that the interpretation'function in general will depend on A. This dependence

could also be made explicit by appending another subscript to 1. However, as we will only

consider languages containing closed predicates, i.e., predicates without free variables, and as the truth value of closed predicates does not depend on the particular choice of the assignment function A, the interpretation turns out to be independent of the

assignment and an extra subscript is not needed.

For symbols from the alphabet, other than the constants and variables, we will denote the interpretation using the underscored version of the symbol: if 9 is such a symbol, then fi stands for Is(9). fi will have its usual mathematical meaning, e.g. for 9

= .,":

stands for the function composition operator defined above.

We will use the following notation for substitution: let p be a term or a predicate, then pXy denotes the term or predicate where each free occurrence of X is replaced by y. For instance, one h a s '

(f.x

=

a or ~ [ x: b I g.x

=

a ])x y

=

f.y

=

a or ~ [ x: b I g.x

=

a]

Definition 5. Interpretation

Let F = < C, P, D, R, V > _{be a functional database scheme with data language LF. Let}

~ be the set of terms without free variables and LC the set of closed predicates

(constraints). Furthermore, let Sf be the free universe of this database scheme. The interpretation function I for predicates and terms without free variables satisfies: - dom(I)

=

Sf ><

(~

U LC)

for x E ID : I (x)

=

x

for x E C:

Is~x)

=

sex)

for x E P : Is(x)

=

sex)

for f and g f-terms : Is(f"g)

=

I (fl".Is(g)

for f an f-term, a an a-term anJ 9 E ( ., " ) : Is(f9a)

=

Is(f)fiIs(a) *)

for f an f-term and 9 E (dom, rng ) : Is(9(f)) = fi(Is(f))

for f an f-term, cr an s-te=. and 9 E ( i, ", . ) : Is(f9cr)

=

Is(f)fiIsCcr)

for 0'1 and 0'2 s-terms and 9 E (U, fl, \ + ) : Is(crI9cr2)

=

Is(crl)fils(cr2) for X a variable, c E C and q a predicate:

(13)

III

for al and aZ a-tenns and 9 E { =, F, <, S;,

>,

~

} :

*

IsCa19aZ) = true if and only if IsCal)!!IsCaZ) holds ) for

a

an s-tenn and a an a-tenn and 9 E ( E, 1= I : IsCa9a) = true if and only if IsCa)!!IsCa) holds *)

for al and aZ s-tenns and:~ E {£, 1;., =, F } :

IsC al 9aZ) = true if and only if IsC al)!!I~caZ) holds for fl and fZ f-tenns and 9 E {£,I;., =,1= } :

IsCf 1 9fZ) = true if and only if IsCf 1 )!!IsCfZ) holds

for ql and qz predicates and 9 E { and, or, implies, iff} :

IsCq19qZ) = true if and only if IsCql)!!IsCqZ) holds

for q a predicate: IsC -,q) = true if and only if -,IsC q) holds for X a variable,

a

an s-tenn and q a predicate :

IsCE[ X : a I q ]) = true if and only if there exists a y E IsC a) such that IsCqXy)

holds

for X a variable,

a

an s-tenn and q a predicate :

Is(&[ X:

a

I q]) = true if and only if for all y

s..

IsCa) IsCqXy) holds Is yields false in all other cases

ad *) : it will sometimes happen that while interpreting an a-tenn we find it is not

defined. For instance, when a function Can f-tenn) is applied to a constant Can a-tenn)

which is not a member of its d~lhain, the result, an a-tenn, is undefined. We will add a

special constant: undef to ID to account for this situation. Whenever undef enters in the interpretation of an a-tenn, the a-tenn is interpreted as undefined and is assigned the value undef. When undef enters in the interpretation of a predicate, the entire

predicate is assigned the value false. A similar problem for s-tenns doesn't arise because a set without any elements is a valid set.

The Closed World Assumption is implicit in our definition of the interpretation function.

When it does not follow from the state s of the database that, e.g., x E a for some set a

and element x, then it is implied that x 1= a.

It is straightforward to verify by induction on the number of quantors that every tenn

and predicate without free occurrences of variables has an interpretation according to the rules specified in definition 5.

It also follows directly from definition 5 that for each s-tenn st without free variables

there is a category index c such mat A s E S : list) c IsCc). Hence each interpretation

of an s-tenn without free variables equals a finite set.

The data language LF does not recognize any structure in the elements of ID. Therefore, operations such as projection cannot be expressed in it.

Now we are able to define constraints on the free state space. This is the fourth step. Definition 6. Constraints

For a functional database scheme F=< C, P, D, R, V

>

with data language LF the constraints are closed predicates.

III

(14)

'.

Definition 7. State Space

Let < C, P, D, R, V > be a functional database scheme and let SoC be a set of constraints, then the state space S is

S = { S E Sf I A[ co : SoC I Is(co) = true] }

where I is the interpretation induced by < C, P, D, R, V >

III

It is required that the updates of a database keep the constraints invariant.

Now that we have defined a data language, we are able to formulate queries. A query is

a set valued function with the fr'i.e state space as domain. To each state s a set term _T,f satisfying the query specification is assigued.

Definition 8. Queries and Views

Let F=< C, P, D, R, V > be a functional database scheme with data language Lp- Then a

query specification is an s-term without free variables. With each query specification q a function

Q E Sf --> IP(ID)

is associated, such that Q(s) = Is(q). This function Q is called a view.

III

Looking back at definition 4 we see that at the basis of the definition of a set term lies the category. Typically, a set term is obtained by taking a subset of the state of some

category. For this to be possible, the category obviously has to be present in the

definition of the scheme. This seems to pose a limitation to the kind of queries that one can ask. For instance, when we want to know, in case of the database scheme of section 2, which courses (by name) have been examined in 1987 and in what month, we seem to be able to find out the courses, e::g. by asking

$

[cn: cname I E [e: exam I with. (for. e) = cn and y.(on.e) = 1987]]

and the months, for instance, by asking

$ [ mo: month I E [ e: exam I y.(on.e) = 1987 and m.(on.e) = mo ]]

but not the combinations of these two. The problem is solved at the database scheme level by adding a new category cinm and two properties: cml from cinm to cname and cm2 from cinm to month, and some restrictions to specify the objects in cinm and their relationship to those in course and month.

(15)

The schel

I.

omes

--

_-

-c. ... C:W\~

C.

",o...,e.

_c

_i""_WI _W\o"'+~

~ t.> i t\.,

"'"

co .... "se, A,.l(CA,WI -" _~_..._t~ ~

...

0\'\

.

~ 1=-ij.2- _rArM(

. ...-'require greM€j~:g conslraints to be satisfiw

, , - A [cn: cname I A [mo: month] I

E [cm: cinm I cm1.cm

=

cn and cm2.cm

=

mo]]]

A [cl: cinm I A [c2: cinm!

'J

cm1.cl

=

cm1.c2 anc. cm2.c1

=

cm2.c2 implies cl

=

c2]]]

'1

C1"'''

In other words. for every combination of a course name and a month there is an object in cinm. Moreover. this object is uniquely identified by its two properties cml and cm2. The state of category cinm thus can be throught of as the cartesian product of the states of cname and month!

The query now becomes:

$ [ cm: cinm I E [ e: exam I y.(on.e)

=

1987 and m.(on.e)

=

cm2.cm

and with .(for.e)

=

cm1.cm]]

It remains to choose a suitable representation for the objects in cinm (see below).

5. Diagram Techniques and Constraints

As stated before. a database scheme may be represented by a directed graph with

labeled edges. Although this graph gives a complete representation of the basic structure of the database. it does not allow one to express any semantic information that may be available. In this section we will introduce a diagram technique [see BAC69] which will give us an overview of the most important aspects of the datamodel. starting from the graphic representation of the database scheme.

Diagrams

As a first step we will introduce special symbols for the various types of categories and functions that we have encountered so far. We start from the schema and replace every

node with the symbol which is appropriate for the category it represents. First, an

attribute category will be represented by a circle, other categories by a rectangle. Each symbol will contain the category name. Secondly, every edge stands for a property and will be represented by a continuous line with an arrow in the direction of the

corresponding edge. When the situation is clear: for instance. when a property has an attribute category for its range, Ihe arrow will be dropped from the line. All lines will be labeled with the corresponding property name. These diagrammatic conventions have been summarised in Table 1.

(16)

Standard Constraints

Secondly, we will include special symbols for various standard constraints with regard to the properties in the model. In the following, we will use 'category a' or 'a' to denote the state s(a) of the category with name a, given a database state s. Similarly, we will use 'property p' or 'p' to denote"the state s(P) of a property with name p.

,j

We distinguish the following ca~es :

Definition: Completeness Constraint

Often a property PEP with domain category a and range category b is required to be complete i.e. it is required to satisfy:

A[ x:a I E[ y:b I p.x=y ]]

A property satisfying this constraint is drawn in the diagram with a solid diamond at the \lase of the property line :

-,-II

)~Ibl

p

. . . ! "4'

" The imposition of a completene~;s requirement on a property is one way of modeling

referential inte!ffity. The property then, obviously, represents the fact that for every element of its domain there exists an element in its range. For instance, the date of every exam should be a valid date (falling in one of the reserved periods), and the subject of every exam should be an existing course.

Definition: Onto Constraint

A function pEP with D(p)

=

a and R(p)

=

b is said to be from a onto b when the

following predicate holds: A[ y:b I E[ x:a I p.x=y )]

A property p satisfyiGg this constraint has a solid diamond at the tip of the function

arrow:

A function that isn't onto, is into. Definition: oile-to-one Constraint

When a property pEP with R(p) = b and D(p) = a is one-to-one from a to b it satisfies:

A[ x:dom(p) I A[ y:dom(p) I p.x=p.y implies x=y)]

Here the values x and y can take have been restricted to come from dom(p). When x 1=

(17)

Such a comparison then would not evaluate to true, rendering the interpretation of the entire predicate false.

A property p satisfying a one_to_one constraint appears in the diagram with a cross bar on the function line:

-,

~I

t---ti

··~I

b

I

p

A property p that is one-to-one from one category to another and also complete, but not onto can be used to model speeialisation (the ISA relationship in the Entity-Relationship model of Chen [CHE76]). An object 0 in category a then has a unique connection with an object 0' from category b and through property p it inherits all the properties of 0'. This construction is particularly useful when the objects of category a have all the properties of the objects of category b, but in addition have some properties of their own that are not shared by all objects in category b. As an example consider a category

"personnel" of the University, which has properties like "name", "address" and so on. However, some employees belong to the teaching staff and give certain courses, while other employees belong to the administrative staff and are specialised in financial or insurance matters.

Definition: Key Constraint

Consider a set A of properties with the same domain category

a: As< ( PEP I D(p) = a}. We call these properties keyproperties of a when the following key constraint is satisfied :

A[ x:li I A[ y:li I (A[ p:A I p.x = p. y ]) implies x=y

II

where li = $[ x:a I A [p:A I x E iom(p)]] holds.

A key constraint is denoted by an arc between the participating properties. Since these properties do not have to correspond to adjacent lines in the diagram, the arc has a symbol "0" on the appropriate function lines. A single category may have more than one set of key properties. Only minimal keys are indicated in the diagram. A set of key properties is minimal when omission of any of the properties from the set yields a set of properties that does not satisfy the key constraint. Consider for example the case that

the properties f from a ~o b and g from a to c are required to be key properties of

category a. This is expressed in a diagram as :

Please note, that evcrry 6ltmrent in a category has a unique identification. An important difference between this intrinsic identification and identification through a set of key properties is that the intrinsic id!ntification has to be present, otherwise the object does not exist. Key properties only provide unique identification for an object in their domain category, when the complete set of objects in the range categories for the object is known. We'll return to identification issue in more detail when we discuss

(18)

M!.fq £

When the set A of properties is 1\ singleton, we recover, as a special case of the key

"J.

constraint, the one-to-one consu:aint!

When a property p with dQmairy, D(P)

=

a and range R(p)

=

b is constrained to be

complete and one-to-one from a to b, it can be considered as modeling a

one-attribute-key constraint. When this property moreover is required to be onto as well the two categories become equivalent: every element in a then correponds to precisely one, unique element of b. For example, we can require every student in the database to have a unique student identification number.

I

~~-!-.---,

_c

i

•

~&

• ':pij!tia,

is shorthand for:

;';·l<~'

"Tnis'~(mvention

allows us to indicate only one set of identifying properties. These properties (e and f) form the so-called primary key of the category (c).

Ollie; Common Constraffits

There are various other common constraints for which we do not introduce a special notation. These constraints involve several properties and categories. We will give some examples.

Composition Constraint. an example:

Consider categories a, b and c, ,: property f with D(f)

=

a and R(f)

=

b, a property g with D(g) = b and R(g) = c and a property h with D(h) = a and R(h) = c. The three

(19)

properties satisfy a composition constraint when we require h = gOf. Informally the constraint means that it does not matter whether we use f and g, or h to find the

object 0' in c corresponding to an object 0 in a : both routes through the diagram lead

one to the same result. Formally, we have:

A[ x:a I x E dom(f) implies h.x = g.(f.x)

1

In a diagram this situation could'Jooklilce :

<:0.

"

~ r

b

l-

S

C

This data mGleI may seem to have some redundance, viz, when the composition constraint holds, we could do without, say, property h. However, the diagram shows a

case in which properties g and h are required to be complete, but f is not. In other

words, for every object x in an h.x is known, whereas possibly x IE. dom(f). In such a

case h.x can not be constructed by applying gOf to x. Combination Constraint, an example :

Another class of constraints arellie so called combination constraints, which can be used to model mutually exclusive or complementary situations. An example of the former type of constraint is the following case. Consider categories a, b and c and properties f with D(f) = a and R(f) = b and g with D(g) = a and R(g) = c. We could impose the following constraint :

A[ x:a I

ffi[

y:b I f.x = y] implies x IE. dom(g» and (m z:c I g.x = z

1

implies x

1=

dom(f) ) ] Ig a diagram this sitlJati~js r~l'res~nted by

I

(

I

_0..

c

f

I

~egory a thus is partitioned into (in-this case two) mutually ~oiRt subsets, each of which is the domain of a sepera'i.e property. A similar situation can, of course, arise when a is the range category of several properties.

Another common combination constraint arises in connection with attribute categories. It

often happens that one does not want an attribute category to contain more objects than necessary. An attribute category should contain information about properties of entities and relationships and such information without the link to a corresponding entity or relationships does not seem to make sense. In such a situation one imposes the following combined surjectivity constraint:

(20)

-.,

_,-,

In other words: for every object x iri category a there is at least one other object y such that x specifies a property of y.

The use of constraints in modeling common situations is further illustrated in section 9.

6. Representations

At this point we have seen two different ways of identifying an object in a category. Pirst of all, every object is unique and therefore can be uniquely identified. But then also, some categories are required to satisfy a primary key constraint. Consequently, the objects in such a category can be identified by means of a set object, in other

categories. In addition to their il'trinsic identification, they also have an external identification.

When it comes to choosing a representation for a category we have to make sure that every object is assigned a unique representation. In other words: for all cI' c2 in C we require V(cI)

n

V(c2)

=

$. This is not a very restrictive requirement. If it preferred to denote objects from two different categories by the same representation, such as in the case of the categories 'day' and 'month' in fig. 1, where we may want to choose V(date)

=

[0 .. 31] and V(month)

=

[1 .. 12]. One can always add the category name as a prefix

to distinguish them. With this convention it will then suffice to specify only the

nontrivial part of the representation, omitting any reference to the category prefix. Given the fact that we can always construct a unique representation for every category, we sometimes have a choice between two options. We can base the representation of the objects in a category on the intrinsic identification of the objects. Categories with such a representation are called basic categories. The subset of C containing the names of these categories is called CB. This option typically is the (only) one for attribute categories.

Or, in case of a category satisfying a primary key constraint, one has the alternative of composing the representation of such a category from the key properties and the representations of the range categories of these key properties. Categories with such a representation are called derived categories. We denote the subset of C containing the

names of these categories with CD. Obviously C

=

CB U CD and CB

n

CD

=

$.

The notions introduced above can be formalized as follows. Definition 9. Domain Specification

Let P =

<

C, P, 0, R, V > be a functional database scheme with data language Lp. A domain specification for P is a pair < B, G >, satisfying

II!

B is a set valued function

dom(B).£ C, A cI' c2 E dom(B) : ci

i=c

₂

=>

_{B(C I)}

n

_B(C2)

= D

G is a set valued function

dom(G)

=

C \ dom(B), ACE dom(G) G( c) is the primary key of c.

A set B(c) for some c E dom(B) is called a base domain. It contains the representations of the object in category c. Such objects are called simple. The function G is called the domain derivation function. Dom(G) is the set of categories with a derived domain. An object from such a category is called a complex object. Every set in rng(G) contains the

(21)

properties that specify the categories from whose domain the domain of the corresponding category is derived.

We have G(c).Q DI\(c). The relation between V and the pair

<

B, G > is as follows. Definition 10. Domain Function

Let F = < C, P, D, R, V > be a functional database scheme with domain specification <

B, G >. For all C E C we define

i) c E dom(B) : V(c) = B(c)

ii) c E dom(G) : V(c) = I"(lambda x E G(c) : V (R(x» ). II!

The lambda expression above defines a set valued function and the ]l:operator, called the (generalized) product operator, transforms this function into the set of all functions f satisfying:

x E G(c) => f(x) E V (R(x».

A complex object belonging to a category c E dom(G) is then represented by a function,

or tuple, with domain G( c), the set of primairy key properties of c.

The product structure of a derivtXl domain allows us to build quite' complex' objects indeed. We can nest one complex object inside another. When we are careful not to

introduce cycles in the domain ~tructure, a derived domain will be well defined. We

formalise this observation in the following. Lemma

Let F = < C, P, D, R, V > be a functional database scheme with domain specification <

B, G > and V defined as above. Let T E dom(G) X dom(G) be such that (x, z) E T if and

only if RI\(z) 11 G(x)

1=

cpo If the transitive closure of Tis irreflexive, then V is well defined.

II!

To ensure that domains are correctly defined, we will therefore assume in the following that the domain derivation function G is such that the transitive closure of Tis

irreflexive. In that case a domain can not be derived in any way from itself and it is guaranteed that the recursive definition of lO.ii) is valid.

Definition II. Representation

Let F = < C, P, D, R, V > be a functional database scheme. A representation ofF is specified by a 7-tuple

<

CB, CD, G, P, D, R, B > with

II! C =CBUCD CB =dom(B) CD=dom(G) ACE CD : G(c).Q DI\(c) ACE CB : V(c) = B(C) ACE CD : V(c) = I"(lambda x E G(c) : V(R(x»)

the transitive closure of T (as defined by G) is irreflexive

ACE CD : G(c) is a primary key.

We see that a functional database· scheme can be represented in very many ways, depending on the choice of Band G. Each choice of representation has its own emphasis and implications. Note, that we have not required that every category

(22)

good reason not to do so, it is better to avoid the redundance introduced by keeping a category satisfying a primary key constraint basic.

At this point we can, on the basis of their properties, distinguish three types of categories. We already encountered the attribute categories (see def. 2), which are categories without properties. The other two types are given in definition 9.

I

Definition 12. Entities and Relationships

Let F = < C, P, D, R, V> be a ~~nctional database scheme with data language LF and

domain specification < B, G >. :

ACE C is called an entity categOlY if and only if - c is not an attribute category imd

- c .edom(G) or A f E G(c) : R(t) is an attribute category.

ACE C is called a relationship category if and only if c is neither an attribute nor an entity category.

III

Note again that our definitions are very similar to mo. st definitions of the

Entity-Relationship model. Entity-Relationship categories typically have a derived domain as have some

entity categories such as date in fig. 1. An entity category will be represented by a box,

a relationship category be a diamond (see table 1).

The functional data model is a very general and flexible model. In fact, the relational data model (as well as some oth.::r models) can be regarded as a special case, namely the one specified by the following constraints:

every category occurring as the domain category of a property has a primary key, the range category of evelY property is an attribute category.

Thus there are no relationship categories in the relational data model, only entity categories. Sometimes an extra constraint is imposed: all properties (also the non-identifying ones) are complete. If this constraint is imposed as in Codd's original proposal [C070], no incomplete information can be represented.

Such a requirement may turn out quite inconvenient in practical situations, where

information often becomes available in portions. In the functional data model, where, in

general, partial functions are allowed, no such problems arise.

7. An Example

As an example we will consider. a fragment of a University database. This database contains facts about students ar.d the courses they take. The central objects in the

following model are therefore t~e objects "student" and "course". The relationship

between students and courses, represented by the object "enrol(ment)" is many_to_many : students usually enrol in more than one course in a given period and the enrolment in a particular course most of the time exceeds unity. Of a student the name, address and department are registered. Each student has a unique registration number. The student's address is a complex object consisting of the three-tuple <street, number, town>. When students have attended a course they usually want to take an exam for the course for which they get a grade. Exams for a particular course can only be taken a few times a year. Since the University offers many courses numerous courses have to be examined on the same day, so the object "exam" represents an m-to-n relationship between the

(23)

~r.:!' ,

objects "course" and "date". SiIfce many students may take a particular exam and may be

forced to do so on several occasions when they don't pass the exam on the fIrst try,

there also exists an m-to-n relationship between the objects "exam" and "student". This relationship is represented by the category "test". The object "grade" is really a property of the test and so it made an attribute of test. Just as "address", "date" is a complex object, being characterized by the objects "year", "month" and "day". Finally, apart from their own unique code, courses also have a name.

The short description above can be represented using the following diagram :

e

8

f2 r, ~fn:

1--;-Sl~U-d'"

... test ...

~

.... · ...

~

...

·8

f,

r,

r.

"'7'

f"

~

f .. . ~~o ~ . r -...

r_:'.,;'::~,:;·+·":,3

8

street .... ~~ ... oddress ...

r.

?, ) ,.

'f

If.

~

e

• e

Functiofill Data Model

The diagram shows a relationship category "enrol(ment)" and two relationships

categories, "test" and "exam", which link the two central objects "student" and "course". "Test" is the most complex object: it can be projected onto the objects "student" and "exam" the latter of which itself is a complex object. "Exam" can be projected onto the objects "course" and "date" the latter of which can in turn be projected onto the

attributes "year", "month" and "day". Of the remaining objects only "address" is complex, the rest of the categories are simple.

It is simple to read off the database scheme for this fragment from the diagram : { stud, sname, dept, address, street, number, town, test, grade, exam, course, ename, date, year, month, day, enrol}

{fI, f2, .. ·, f I6, f17} '.,

{ (f 1; stud), (f2; stud), (fy _{stud), (f4; address), (fS; address), (f6; address), (f7; test),} (f8; test), (f9; test), (flO; ~xam), (f

_ll;

_{exam), (f12; course), (f13; date), (fI4; date),} (fIS; date), (fI6; enrol), <.f17; enrol) }

(24)

stud), (fg; exam), (f9; grade), (flO; course), (f11; date), (f12; cname), (f13; year), (fI4; month), (fI5; day), (fI6; stud), (f17; course) }

We see, that the infonnation contained in the database scheme is identical to the

scheme infonnation contained in the diagram. One could, given a diagram, do without the fonnal version of the database scheme and vice versa.

What is still needed, is an inforinal description of the meaning of the functions. For reasons of clarity (in the diagram), we just labeled the functions f 1 through f 17. The functions express the following properties:

f 1 : the department of the student

f2 : the name of the student

f 3 : the address of the student

f 4 : the street part of the address

f5 : the house number part of the address

f6 : the town part of the address

f 7 : the student taking the test

fg : the exam for which the test is taken

f9 : the grade, obtained for the test

flO : the subject of the exam· ~,'

f 11 : the date on which the exam can be taken

f 12 : the name of the course

f 13 : the year part of the date· f 14 : the month part of the date f 15 : the day part of the date

f 16 : the student enrolled in a course f 17 : the course a student is enrolled in

A domain specification

<

B, G

>

going with this database scheme is the following:

G

=

{(enrol; _{( f 16, f17} )), (exam; (flO' fl1 )), (date; (f 13' f 14' f15 )), (test; _{{f7, fg} }, (address; ( f4' f5' f6 })}, and B

=

{(year; [1950 ... 2100]),

(month; (jan, feb, ... ,<lee j), (day; [1 ... 31]),

(course; {nxm I n E [0 ... 7] and x E {A, B, ... ,Z} and m E [010 ... 999]}),

(cname; (DBSl, DBS2, DBS3, OvIS, ... j)

(grade; [1 ... 10]),

(sname; String(20»,

(dept; (MATH, CS, CE, EE, ME, ... j),

(stud; (n E N In mod17

=

OJ),

(street; String(20», (number; [1 ... 1000]),

(town; String(20»}.

The domain function is given by

V = B U { (enrol; TTl)' (exam; TT 2), (date; TT

₃

),

(test; TT 4)' 19

(25)

(address; ITS) } with ITl =IT({(f16; V(stud»,

(f

I7;

V(course»)),

IT2 = IT({( 10; V(course», (f

ll; V(date»))

IT3 = IT({( 13; V(year», (f 14; V(month», (flS; V(day»))) IT4 = IT({(f7; V(stud»,

(f8; V(exam)/}) ITS = IT({(f4; V(street»,

(fS; V(number», (f6; V(town»)'}).

We have used a shorthand notation in the definition of V(stud) to denote the set of numbers which are divisible by 17, rather than enumerating all the elements of the set.

From this choice of representation it follows, that once a student has a registration

number, it stays with him for rest of his life.

8. A Design Method

In this section we will concentrate on the ingredients required for constructing a functional data model, complete with queries. We will distinguish the following steps: 8.1 Scheme

In the introduction we discussed to some extent what is needed to construct the

database scheme. The most

imp'~rtant

ingredient is to establish the boundaries of that

part of the world that we want t;) describe, the object system. Once we have decided on what will belong to the object system, we can continue and make an inventory of what is inside the object system.

The primary input for the construction of the database scheme is a description of the object system. Such a description records the kind of activities that are carried out inside the DoD, who carries them out and what objects are acted on. This description should be as complete as possible. It will in general be an informal description but it has to contain enough detail to be turned into a formal one. From this description we will distil a set of properties and categories.

When the system description is available, one can start to categorize the objects inside

the system. At this point it becomes important to abstract from the real situation and

decide what aspects are relevant and should be included in the data model and which ones should not. For example, the name and address of a student are not really important for the registration 0/ :lis grades; however, they have to be included in the datamodel when, for example, test results are usually mailed to the students.

Once we have set the boundaries for the object system and have abstracted from the real life situation, we can construct the scheme < C, P, D, R> by simply enumerating the categories needed and their functional relationships. This information can be represented in a labeled, directed graph.

A top-down approach is useful for constructing the data model. At the start of the

(26)

uncover the essential categories and properties. Then, later on in the process, one can step by step refine the model be. adding more detail as seems appropriate. At this point presumably all categories, that will eventually become entity and relationship categories, are known and one is introducing attribute categories. For instance, as in the case of , address', it is in the first stage of the modeling process presumably sufficient to know that objects of the type address should be present in the data model. Then, at a later stage, it may become necessary to introduce detials such as 'street' and 'town'.

Instead of approaching the construction of the datamodel from the system point of view one can also take one of two alternative approaches. One could decide to analyse the changes that take place in the object system: what enters the system, what goes out of it, what is transformed inside the system. This way one finds out what information is handled in and by the system and then also what information is available. By answering the question what changes are worthwhile registering one arrives at the datamodel. The system thus is described from the measurement or supply side point of view.

A third approach consists of pinpointing the "characteristic" queries that one wants to ask the database. The information needed for answering these queries then forms the

basis for a datamodel. The syst~m now is being described from the demand point of view.

The three alternative approaches clearly are complementary to one another. Starting from one side one can use insights gained from the other two approaches to check whether no aspects have been overlooked. The approach mentioned first, taking the system point of view, is more fundamental than the other two, since it turns up the more "characteristic" information, viz. the information the system needs to function. The supply side approach then tells us in what form and at what stage this information will become available, whereas the demand side approach will tell us how it will be used.

Starting from the measurementside, one has, at one point, to decide what to register and what not. The criteria for these decisions have to come from studying how the system functions, i.e. by taking<he system approach. Starting from the demand side, one has to overcome the difficulties'that "characteristic" queries are rather prone to change (by the time the database systerr, is finished the queries it is supposed to answer have

become outdated) and that one~an never sum up all queries.

Above we have considered the simplified situation that the entire object system has been covered by the system description. It is more common that there exist several

overlapping descriptions, each dealing with a part of the system. In our example we only considered the University system as far as it was concerned with students and their

academic achievements. It is easy to imagine a similar description of a related part of

the system, detailing the activities of the staff. This description will overlap with that of student activities, since the courses that students take are taught by members of the staff, who also will grade the exams. Integration of these indepently constructed descriptions (views) and, in particular, their datamodels into a single datamodel for the larger system usually is not such an easy task.

8.2 Differentiation of categories and functions

In the next step we start to diffc~entiate between the different categories and

properties. First we identify the attribute categories. Then we concentrate on the other categories and study how objects in those categories are identified. We analyse which propenies have to be complete 'imd which sets of properties have to satisfy (primary) key

(27)

~onstraiJ;lts. This aIl€!W&t~Ii!' ~d~nlify~t~~ are rela~ons'!lp c~teg0ries. For

mstance, when we are usmg diagrams

and

eneountlfF'the followmg situatlon :

we know that caf&gory "test',;'is a reiatt0BS~gory and that properties "f7" and "f8" form a primary key. We will replace this part of the diagram by :

11~lAq ~---

h

After we have identified all relations~'categories in the data model, we also know the

entity categories and we can write down the G-component of the domain specification. All relationship categories and all entity categories occuring only as the domain category of properties with an attribute category as range category are part of the domain of G and have a derived domain.

8.3 Domains

In the previous step all categories with a derived domain have been identified. The categories that remain, attribute and some of the entity categories, have a base domain. For these categories we will have to specify the domains of values that we will use to represent the objects in the corresponding categories with. Once the base domains have been specified, the derived domains can be constructed.

We have to keep in mind here that we are specifying a representation at the logical level. When deciding on what to choose for the representation of a category we therefore need not worry about implementation issues such as the efficiency of space utilisation and processing speed. Presumably, an implementation will store the values that are actually used in a convenient place and manipulate the references to the values, as long as possible, until at some stage output is required.

As representation for the objects in V(c) for some c E C strings of symbols are being

used. The symbols are elements from an alphabet and the strings words in a language based on that alphabet. Usually we will only need a subset or a concatenation of subsets

of these sets. Examples can be found in section 6. We see that by selecting for some

V(c) a particular subset of, let's say, the integers we are imposing a constraint on the

representation. For example, by specifying that V(year)

=

[1920 .. 1999], we have

accoUJ,lted in the lower bound of the range for the fact that the University was founded in 1920 and therefore could not offer exams any earlier.

8.4 Other Constraints

By now we have already been able to incorporate a number of constraints in the data model, some of them being expressed graphically, others emerging in the limitations

imposed on the base domains. These constraints typically deal with just one category or

one function. It is then still possible to express such a constraint conveniently in the

diagram or the domain of a representation. When constraints tie several functions or categories together a simple graphical convention or domain restriction often cannot be

(28)

found. (Also: too many conventions will obscure the picture.) We then express such a constraint in the data language LF and add it explicitely to CN, the set of constraints that are imposed on the free state space to obtain the state space proper. Examples of such constraints are the composition and combination constraint of section 7. Another example is:

A[ 'd t I

«

f " " f " . " f " "

_ x. a e 14'x= apr or 14'x= Jun or 14'x= sep

or f I4.x="nov ') implies f15.x < 31) and

( ( f 14.x="feb" and f 13.xmod4=O ) implies f 15'x

<

30 ) and

«

_{f 14.x="feb" and f13.xmod4FQ) implies f15.x}< 29)] which is a statement of the fact that the dates used should be valid dates.

""I

8.5 Queries and Views

Queries have been defined as sets of objects. As such they can be thought of as

categories which can be derived or computed from other, more elementary categories. When the objects in the query are of a complex type that is not yet available in the database scheme, we have to incorporate this type in the (extended) database scheme by including an extra category and the necessary properties and add the computation prescription (the definition of the categories and properties in terms of other categories and properties) in the form of a constraint to the definition of the state space. The

state of such a derived category in a given database state then constitutes the answer to

the query in that database state.

At this point one can, in case the database scheme was obtained by integration of a number of smaller (sub)schemes, make these sub schemes available in the form of views. Since a view is nothing but a function which associates with every state of the database a query, this amounts to including the proper set of computed categories and functions

to the database scheme.

9. Standard Constructions

There are a number of structur~s that one meets fairly often when constructing a data

model. We will discuss some of these structures and show how they can be represented in the context of a functional data model.

9.1 Activities

It often happens that objects from several categories are associated with one another by an activity or some other relationship. Consider for instance such "things" as the trip of a trucker, the marriage between two people, the treatment of a patient by a medical doctor or the rental of a house. All of these examples have in common that two objects are associated with each other for some period of time. A trip can be regarded as a combination of a trucker and a lorrie which starts at a given time and ends at some later time. The same concepts can be applied to a marriage (concerning a bride and a groom) and a rental (a house and a tenant). The underlying data model is as follows:

(29)

t:.

_~cc~

a. b

F

'\:

'3

~~ ... I: ~ eo",,", " ; ' " &.

Categories a and b represent the participants in the activity; category "time" is used to represent the duration of the activity. Properties f, g and, say, "start" have to be

complete and normally are primary key properties. Consider for instance the rental of a house to a tenant. A person ma:i rent a particular house for several periods of time (say

a vacation bungalow) and may

wso

rent more than one house at a time.

The marriage example is a little bit more delicate, because the concept of marriage is more restrictive than that of rental. In the former case, the combination of groom and starting date or of bride and starting date suffices to identify a marriage provided we require, as an ordinary key constraint, that the complementary combination of bride and starting date and of groom and starting date respectively is also unique. In this case only two of the three properties have to be known to identify a particular marriage.

We would represent~ategory 'act' as a relationship category:

L.-_o._~~-

-

-f-

--FrOID the examples it is clear that the end time of the activity does not have to be known.

A frequently occuring situation is that participants take part in only one activity at a time:

A [ x E act I A [ y E act I ((x E dom(end) and f.x

=

f.y) implies end.x 5 stat.y) or

(( y E dom(end) and f.x

=

f.y) implies start x ~ end.y)

n.

9.2 Relationships

Sometimes a relationship has a more permanent character, such as in the relation between students and the book they are required to study for a given course. In this case there are many-many relationships between students and books, concerning the possesions of a book by a stude ... t, between books and courses, concerning the books that are required reading material fc.: a course, and between students and courses, concerning

(30)

I

the registration of sludentsJoI a c o v . ... three relationships can be modeltld by a

relationship category : . .

8-·----

--·----B

•

. 1

_,

---E;t---.-:~

94.

ctions of more than one variable

~hen we want to model a function F which has as its domain the Cartesian product of

more than one category we use one category representing the Cartesian product and another category containing the function values. The property between the two additional categories then represents function F. Consider for example the function F E A><B><C

-+

D. This hction is modeled as

.'

-IT]

_,

~---f---A

0 F

,

h

CD

1 ~

"'~'

~.4

Set

valued Functions

In the functional data model of Sliipman [SmSI] functions are multivalued. Application of a function F to an element x of its domain category a will produce a subset of the range category b : F(x) .Q b for x E a. Graphically this is depicted as :

b

F

Tft~_:;iualli~can be modet..ed by another application of the relationship <category : L...--CI.

--Jf -- ---

---1L--_

b

_{_ . - - I}

r·

,

Tk0<_tiORS f ruld g Raw to satisfy the follow'mg consl'taint : fur x E dom(F) : x E a and g.(fIlX) = F(x)

(31)

When we want to realise that category a at all times (i.e. for all states of the database) represents dom(F) we have to impose a sUIjectivity constraint on property f.

9.5 Order in Categories

We have already seen one kind of ordering in a category, namely when defining the data language LF we required that every object in a category can be compared - using, e.g.,

.$ - with every other object in that category. In addition to this kind of ordering there

may also be others, depending for instance on a property of the category. Consider:

Q )

b

Property f defines a partitioning of category a into a number of classes of objects. Two objects in category a belong to the same class when they are mapped by f into the same object of category b. We can order, differently from their natural order, the objects in class as follows:

I. Introduce a property g, which is Be-to-one from 'a' to 'a'

- - -

-co/

t

~

I

b

9 _f

which satisfies the follow i ng constraints:

- g relates only elements cf the same class

A [ x: dom(g) I f. (g. x) = f.x

1

- g relates all elements of the same class A [xI:a I A [x2:a I

I.

(f.xI

=

f.x2 and not E [ z:dom(g) I g.z

=

Xl or g.z

=

x2]) implies Xl

=

x2ll - g does not introduce cycles in category 'a'

A [y:b I fAy

=

<1> or E [x:a I X E fAy and X .erng(g)

II

2. Another solution would be to introduce both a property and a category as follows: