Next: Adding Computation Domains Up: A Basic Language Previous: Database Programming

Datalog and the Relational Database Model

The language we have seen so far, having (logical) variables, constants, user-defined predicates (which can be assimilated to program procedures), and the equality constraint =/2 is a constraint language. This language is, however, severely impeded by the lack of data structures and arithmetical operations, and we will introduce them later. In fact, its power is equivalent to that of propositional logic (i.e., logic without variables), because every program in our first language can be rewritten to a semantically equivalent propositional program, and any propositional program is, directly, correct in our language.

Notwithstanding, augmenting this language with numbers and arithmetical operations, and (for the sake of practicality) other facilities (such as negation), produces a far superior language, termed Datalog, which is often used in advanced databases. Without adding anything to our language, we will show how it can be directly used to model common operations in relational databases.

Basic structural components of relational databases are tables, which are collections of tuples (rows) having the same number of components in each tuple. Each component of every row has a type, such a string, number, date, etc., usually from a set of predefined types available in the database; we will not deal with such types at the moment. The arguments in the same position of all the rows in each table belong to the same column, and every column has an attribute, which usually names that column. Figure 2.4 shows two tables which can be part of a database which collects information about persons and cities where they have lived.

Figure 2.4: Two tables in the relational database model

Name	Age	Sex
Brown	20	M
Jones	21	F
Smith	36	M

Person

Name	Town	Years
Brown	London	15
Brown	York	5
Jones	Paris	21
Smith	Brussels	15
Smith	Santander	5

Lived-in

The order of rows in immaterial, since they are not accessed and retrieved by number, but according to the matching of the arguments. Similarly, the order of columns is not important either, since they are labeled with attributes; but it will be important for our translation to a logic language. It is important to note that duplicate rows are not allowed, or, rather, that they are meaningless, since duplicated solutions are not taken into account at all.

A translation to our logic language takes every part of the database and casts it into the component of the constraint language following the paths below:


Relat. Database 
Logic Program

    

Relation Name 		
Predicate symbol


Relation 		
Predicate consisting of ground facts (facts without variables)


Tuple 		
Ground fact


Attribute 		
Argument of predicate

It is important to note that, since in our language, arguments of an atom cannot receive a name (but other logic languages allow it), the correspondence attribute name $\rightarrow$ argument position must be respected in the whole translation. The fragment of database in Figure 2.4 can be translated to the set of facts below:

person(brown,20,male).
person(jones,21,female).
person(smith,36,male).

lived_in(brown,london,15).
lived_in(brown,york,5).
lived_in(jones,paris,21).
lived_in(smith,brussels,15).
lived_in(smith,santander,5).

Using this translation scheme, which uses a set of facts to model a static database, the usual operations on relational databases can be easily defined, an implemented using clauses. As mentioned before, the result is that not only the database, but also the different queries, views, etc. can be programmed using the same language.

Union: two clauses define that a table is constructed by taking elements which belong either to table s or to table r. Extending it to more than two tables is straightforward:
```
r_union_s(X₁,,X_n) r(X₁,,X_n).


r_union_s(X₁,,X_n) s(X₁,,X_n).    
```
Set Difference: tuples belonging to one table, but not to the other. The implementation of Set Difference needs negation, which we have not discussed yet: we will come back to it later. For now, it will suffice to know that a general and proper implementation of negation in logic languages is very difficult, and usually only a restricted version of the full logical negation is available. Fortunately, for the purpose at hand (relational databases), implementing a sound logical negation is possible, since the tables are always finite and there are no data structures which can construct infinite objects.
```
r_diff_s(X₁,,X_n) r(X₁,,X_n),

		not s(X₁,,X_n).


r_diff_s(X₁,,X_n) s(X₁,,X_n),

		 not r(X₁,,X_n).
```
We will later discuss negation more in depth.

Cartesian Product:


r_X_s(X₁,,X_m,X_m+1,,X_m+n) 


r(X₁,,X_m), s(X_m+1,,X_m+n).

Projection:
```
r13(X₁,X₃) r(X₁,X₂,X₃).
```
Selection: the selection criteria is just another predicate which can fail or have success for a tuple of data. In general it could be any user predicate, but in this case we will use the arithmetical predicate , which we assume is already defined by the system.
```
r_selected(X₁,X₂,X₃) r(X₁,X₂,X₃),

		 (X₂,X₃).  
```

Some operations can be expressed as derivatives from the above ones, but they can also be expressed more directly in CLP:

Intersection: tuples which are in r and s at the same time:
```
r_meet_s(X₁,,X_n) r(X₁,,X_n), 

s(X₁,,X_n).    
```

Join: tuples which have an element in common in two tables:


r_joinX2_s(X₁,,X_n) 


r(X₁,X₂,X₃,,X_n),


s(X₁',X₂,X₃',,X_n').

The appearance of duplicate answers, even if there are no duplicates in the original table (e.g., projecting the table lived-in on its first argument) is not a theoretical problem, since they are simply ignored, but it can be a practical problem. Database implementations automatically discard repeated tuples. Similarly, CLP languages have built-in primitives which allow the gathering of all answers to a query and remmoving duplicates.

$% latex2html id marker 3365 $\mathbf\therefore$$

The so-called deductive databases are relational databases which use heavily concepts from first-order logic to implement (actually, to program) explicitly deduction and coherence rules. They use commonly a language similar to the one we have just developed, plus some extended facilities. This language is usually a subset of a logic-based full-fledged language. It is language of this kind, even augmented with constraint solving capabilities, which we are aiming at now.

$% latex2html id marker 3367 $\mathbf\therefore$$

Next: Adding Computation Domains Up: A Basic Language Previous: Database Programming

MCL
1998-12-03