| |

Lightweight Deductive Databases on the World-Wide Web

Department of Computer Science
The University of Melbourne
Parkville, Victoria 3052, Australia
Email: {swloke,ad,leon}@cs.mu.oz.au

Abstract:
We investigate a Web information structuring mechanism called lightweight deductive databases. Lightweight deductive databases enable more sophisticated automated searching, extraction, and processing, and can facilitate agent-based programming. We also explore how these deductive databases benefit from being distributed on the Web.

1. Introduction

Our aim is to enhance the Web with information which is more susceptible to sophisticated automated searching, extraction, and processing. Such information needs to be stored in a structured form, and be sufficiently high-level (or abstract) to be programmable, while remaining readable.

An interesting candidate as a representation formalism is logic programming [24]. Logic programming is based on mathematical logic, where computation is treated as deduction from a set of axioms or rules. Logic programming provides a uniform means for representing data and computations, is declarative (compared to imperative languages), and has a solid semantic basis. In addition, knowledge based query processing is possible with the aid of logic programming. For instance, we can represent concept hierarchies, and meta-level knowledge about databases.

Deductive databases extend relational databases, utilising logic programming rules for more complex data modelling [22]. A deductive database is, in essence, a logic program: base relations map to facts, and rules are used to define new relations in terms of base relations, and to process queries. Also, deductive databases structure information according to predefined conceptual schema. Hence, deductive databases qualify as an appropriate metaphor for information processing on the Web.

This paper investigates how deductive databases can be incorporated into Web pages. In the spirit of [11], we call these databases lightweight deductive databases, since our intention is to use the Web as a source of structured information, rather than to provide functionalities such as transaction processing and query optimisation.

Since lightweight deductive databases are distributed on the Web, they have the following features:

Distributed maintenance. A Web page can contain a part of a deductive database. The completed database can be created as necessary by retrieving the relevant Web pages, and composing them together. This allows the components of a database to be separately maintained, and combined only during query processing.
Extensibility. The dynamic nature of the complete database allows the incremental addition, or removal, of parts during query evaluation. This facilitates the incremental development of lightweight deductive databases. Moreover, users, who are not the database creators, can extend the existing database by writing their own rules, using schemata included on the pages containing the database.
Reusability. Rules and knowledge bases for database query processing can be placed in Web pages, encouraging them to be reused, or shared.
Client-side processing. Lightweight deductive databases complement the work on building forms-based interfaces to conventional databases. However, in our model, query processing is carried out on the client-side, rather than on the server machine. This reduces server load, and permits state information, such as the results of previous queries, to be kept in the client-side process or files. We also utilise client-side caching, which means that once the databases are loaded, they need not be fetched again.

We base this work on LogicWeb [19], an integration of logic programming and the Web, which treats Web pages as logic programming modules. Meta-level rules on how LogicWeb modules can be accessed and combined are written as logic programs.

In the rest of the paper, we illustrate the above ideas, by considering lightweight deductive databases for citations. On-line citations are a valuable resource for bibliography entries, and for obtaining papers.

In the following, we assume that the reader is acquainted with Prolog. In §2, we introduce lightweight deductive databases, and give a brief overview of LogicWeb, followed by a simple lightweight deductive database example. In §3, we show how lightweight deductive databases can be combined and extended, and in §4, knowledge-based query processing is explored. In §5, we review related work, and compare existing deductive database systems with our lightweight counterpart. §6 concludes.

2. Lightweight Deductive Databases

LogicWeb provides the framework for lightweight deductive databases. Below, we outline LogicWeb, and describe a lightweight deductive database of citations.

2.1 LogicWeb

LogicWeb treats Web pages as logic program modules, termed LogicWeb modules [19]. Ordinary Web pages can be parsed to extract facts, such as a collection of links used in the page. This provides an additional layer of abstraction beyond the text of a page.

More importantly, a LogicWeb module can contain a logic program written in Prolog with some extensions. Among the extra operators are ones for invoking goals in other modules, and for combining modules together. A key element of these features is the use of terms of the form m_id(URL) to refer to modules, where URL is the URL of the module.

When a LogicWeb module is downloaded, its logic program, and any other information it might contain (written in ordinary HTML) are loaded into a Prolog environment. The ordinary HTML text is stored in a fact, and can be subsequently processed (e.g., links it contains can be extracted and stored in newly created facts, as mentioned above).

A LogicWeb module may contain a lightweight deductive database (or component thereof). In addition, it may include descriptions of the database (e.g., database schemata) written in ordinary HTML. In the rest of the paper, we shall use the term database modules to refer to LogicWeb modules containing lightweight deductive databases.

2.2 A Citation Database

A lightweight deductive database is a logic program with its clauses categorised according to three main roles: base relations, derived relations, and rules to process queries

. They are illustrated below.

Consider a lightweight deductive database of publication citations. A citation consists of distinct components (or attributes), and it should be possible to perform queries using these attributes. Also, it is convenient for citations to be separately stored and maintained (e.g., by the authors themselves). For these reasons, it is useful to store citations in structured databases, and to combine them only during query evaluation. A possible schema for citations is:
pub_cit(authors,title,pub_type,collection_name,web_location,date)

The schema describes the components of a typical publication citation: the names of the authors, the title of the paper (which is used as the primary key, indicated in bold), the type of the publication (e.g., conference, technical report, or journal), the collection in which the paper was published, the URL of an on-line version, and the date of publication.

An instance of the schema is:

pub_cit([author("Seng","Loke"),author("Andrew","Davison")],
  "Logic Programming with the World-Wide Web",
  conference,"Hypertext '96",
  "http://www.cs.mu.oz.au/~swloke/papers/paper1.ps.gz",
  date(march,1996)).

This fact is not in relational database normal form, as structured data is used. Some of the attribute values can be stored as atoms instead of strings, but strings are easier to manipulate in queries (e.g., we could utilise pattern matching with a tolerance for character mismatches).

From a database of pub_cit/6 facts, a database of journal citations in 1996 can be formed, which conforms to the schema:

journal_1996_cit(authors,title,collection_name,month)

The new relation is derived using the rule:

journal_1996_cit(Authors,Title,CollectionName,Month) :-
    pub_cit(Authors,Title,journal,CollectionName,_,date(Month,1996)).

Rules can be written to process queries on the above databases. For example, the following rules find all the titles of papers by an author in a given year:

get_titles(Name,Year,Titles) :-
    setof(Title,get_title(Name,Year,Title),Titles).

get_title(Name,Year,Title) :-
    pub_cit(Authors,Title,_,_,_,date(_,Year)),
    member(Name,Authors).

A database module can be downloaded to a client system, and queried. Alternatively, the base relations (facts) can be stored in one module, and the query processing rules in another. These can be downloaded separately, and combined on the client-side.

3. Combining and Extending Lightweight Deductive Databases

In this section, we explore the use of virtual relations and relational joins. A virtual relation is formed from other database relations by a set of rules, all defining the same head predicate, but each using a different relation in its body. In a relational join, two relations are combined based on shared attribute values.

From structured logic programming [7], we appropriate the notions of context switching and the composition operations of union, and overriding union. By composing database modules, we effectively combine the databases in those modules.

Other ways of composing logic programs have been considered in the literature, such as intersection [5], retraction [6], and various forms of inheritance [7, 21]. Utilising a larger repertoire of operators allows more kinds of compositions to be expressed in queries, but increases complexity.

3.1 Virtual Relations and Joins

The LogicWeb goal, Module#>Goal, causes the evaluation of the logic programming Goal in the LogicWeb Module. Module is loaded the first time a goal requesting it is evaluated. Subsequent evaluations of goals in the module use the already loaded version. The same loading behaviour is employed by the other LogicWeb operators introduced below. The #> operator executes a goal using only the predicates in its prescribed module, excluding other modules.

In the following examples, we assume a database of technical report details with the schema:

tr_cit(authors,title,tr_number,web_location,date)

A virtual relation, cit/6, of citations can be defined in terms of technical report and publication details:

cit(Authors,Title,Type,BookName,WebLocation,Date) :-
    pub_cit(Authors,Title,Type,BookName,WebLocation,Date).
cit(Authors,Title,technical_report,BookName,WebLocation,Date) :-
    tr_cit(Authors,Title,BookName,WebLocation,Date).

cit/6 assumes that pub_cit/6 and tr_cit/5 are present in the current module (i.e., the module containing the virtual relation). However, if they are stored in different modules, we can employ #> to retrieve them:

cit(Authors,Title,Type,BookName,WebLocation,Date) :-
    location_of(publications,PubsURL),
    m_id(PubsURL)#>pub_cit(Authors,Title,Type,BookName,WebLocation,Date).

cit(Authors,Title,technical_report,BookName,WebLocation,Date) :-
    location_of(technical_reports,TRsURL),
    m_id(TRsURL)#>tr_cit(Authors,Title,BookName,WebLocation,Date).

The locations of the relevant modules are obtained from location_of/2 facts.

A derived relation of citations that are both technical reports and publications can be formed as follows:

pub_tr_cit(Authors,Title,PubWebLocation,TRWebLocation) :-
    location_of(publications,PubsURL),
    location_of(technical_reports,TRsURL),
    m_id(PubsURL)#>pub_cit(Authors,Title,_,_,PubWebLocation,_),
    m_id(TRsURL)#>tr_cit(Authors,Title,_,TRWebLocation,_).

The shared variables, Authors and Title, are used to select citations common to both relations.

3.2 Forming Databases Using Union

The union of two LogicWeb modules corresponds to the set-theoretic union of the clauses in both modules [7]. Union is useful when a query is to be evaluated with respect to two or more modules. In LogicWeb, this is expressed using the lw_union/1 construct in a goal, such as:

lw_union(ListOfModules)#>Goal

This goal evaluates Goal in the union of the modules in ListOfModules.

We can use the following rule to retrieve citation authors from a union of databases:

authors(CitMod,Modules,Authors) :-
    lw_union([CitMod|Modules])#>cit(Authors,_,_,_,_,_).

CitMod contains the original definition of cit/6 from §3.1. Modules is a list of modules, containing pub_cit/6 and tr_cit/5 facts. Citations are retrieved from the union of all the modules, treated as a single citation database. Hence, the term lw_union([CitMod|Modules]) represents a virtual citation database. If the modules are on different servers, they are retrieved before the cit/6 goal is evaluated. This behaviour allows the combined database to be spread across several servers.

3.3 Using Dynamic Composition

In discussing the dynamic composition operations, we employ the notion of current context of a goal (or operation). This refers to the module, or the union of modules, in which the operation is evaluated. The dynamic composition operations compose a prescribed module with their current context, whereas the #> operation ignores its current context.

Dynamic union

The dynamic union operation, denoted by Module>>Goal, forms the union of Module and its current context, and evaluates Goal in this union. This differs from a Module#>Goal operation, which ignores its current context. The difference is illustrated below. The operation is dynamic in that the union is only carried out for the duration of the evaluation of the goal.

The following rule utilises the module m_id(AddressBookURL) which contains a predicate cit_db/2. cit_db/2 contains the database module URLs for every department. The module URL for a given department is retrieved, and the citation authors in that module are then retrieved.

authors(Department,AddressBookURL,Authors) :-
  m_id(AddressBookURL)#>cit_db(Department,CitationsModURL),
  m_id(CitationsModURL)>>cit(Authors,_,_,_,_,_).

The use of >> enables cit/6 to be obtained from the current module, and the pub_cit/6 and tr_cit/5 facts from m_id(CitationsModURL). In contrast, the use of #> means that only the cit_db/2 facts in m_id(AddressBookURL) are used, and any such facts in the current module are ignored.

Another use of dynamic union is to load modules containing utility predicates for processing queries. Dynamic union allows the utility predicates to call predicates in the current context.

Dynamic overriding union

Dynamic overriding union, denoted by Module@>Goal, is used when some predicate definitions in Module are to be used in preference to those in the current context. If the dynamic union operation of the previous example was:

m_id(CitationsModURL)@>cit(Authors,_,_,_,_,_)

then the definition of cit/6 in the current context would momentarily be overridden by the definition of cit/6 in m_id(CitationsModURL).

4. Knowledge-Based Query Processing on the Web

Complex forms of query processing are possible with lightweight deductive databases. We demonstrate this with a longer example involving the logic programming encoding of the domain-specific information within Web pages, and knowledge about the structure of Web sites of a certain type. This information is used to guide automated search.

4.1 Page Types

We assume that research in a Computer Science department is organised into sections. A section is divided into groups, and sections and groups are composed from projects. Each project consists of researchers. This follows the organisation of research in the Computer Science department at the University of Melbourne as of May 1996.

When describing the structure of a Web site, we use the notion of page types. Page types are names for sets of pages (some of which may contain only one page). They enable individual pages to be referred to by their types rather than by their URLs which would be too specific, and may change over time. Page types also allow collections of pages to be conveniently addressed. We utilise the following page types in our example: dept, research, project, project_members, and researcher. These page types will be used for describing our Web search strategies.

The Web structure for a Computer Science department is shown in Figure 1. The departmental home page (of type dept) has a link to a page (of type research) containing research information (in HTML). This information includes a description of the relationships between sections, groups, and projects, and links to pages describing projects (of type project). Each project page has a link to a page containing information about its members (of type project_members), and each project_members page has links to the members' home pages (which are of type researcher). The publication citations for each project are distributed (and separately maintained by authors) in database modules accessible from the home pages of the researchers.

Figure 1: A representative diagram of the hypertext structure rooted at a departmental home page. The arrows denote the sequence in which the various page types are reached, starting from the dept page.

4.2 Searching for Citations

We wish to develop code to find all the citations for a given section. Two approaches are possible:

We can store the URLs of all the relevant citation database modules for a section in a pre-determined module, and query the citations using it. The authors/3 predicate of §3.3 employs this technique.
Starting from some pre-determined page, we can search the Web site for the URLs of the citation database modules.

The first method requires the maintenance of a module containing the database module URLs. For example, if a researcher leaves or joins a section, or if the URLs of pages change, then the module has to be modified. For this reason, we shall use the second method.

Finding database modules consists of two main steps:

use the given section name to find the constituent project names; and
using those names, search the Web site by following the sequence of page types shown in Figure 1, starting from the dept page, until the relevant database modules are found.

The search starts from the dept page, rather than the research page, in order to accommodate changes to the research page's URL. The URL of the dept page is least likely to change.

In order to implement the above strategy, we need to formalise the relationships between sections, groups, and projects (to carry out (1)), and specify the Web structure (in order to do (2)).

4.3 Representing Knowledge

Representing Research Information

To find the projects contained in a given section, we can parse the research page, and extract the required information. However, we would need to do that each time a query is processed. A more efficient alternative is to represent the knowledge on that page as logic programming facts and rules, and reason with them.

The relationships between sections, groups, and projects, are depicted as a hierarchy of concepts in Figure 2. This is a representative diagram; a typical department would have many more sections, groups, and projects.

Figure 2: A hierarchy of sections, groups and projects. The edges represent the has_part/2 relationships.

The relationships are represented using has_part/2 facts, which specifies how sections, groups, and projects are related. contains/2 defines the transitive closure of the has_part/2 relation.

has_part(section(programming_langs),group(new_declarative_langs)).
has_part(group(new_declarative_langs),project(mercury)).
has_part(group(new_declarative_langs),project(lygon)).
has_part(section(programming_langs),project(prolog_techniques)).

contains(X,Y) :- has_part(X,Y).
contains(X,Z) :- has_part(X,Y), contains(Y,Z).

Representing the Web structure

The sequence of page types in Figure 1 can be captured using precedes/2 facts:

precedes(dept,research).
precedes(research,project).
precedes(project,project_members).
precedes(project_members,researcher).

4.4 An Implementation of Citation Finding

A section name (e.g., ``Programming Languages'') can be selected by a user from a menu displayed by the browser. This is translated into a section/1 term, and a goal like the following is generated:

get_citations(section(programming_langs),Citations)

Evaluation of the goal results in Citations being instantiated with a list of pub_cit/6 facts. These are formatted and displayed by the browser.

get_citations/2 uses the logic programming representation of the research information (contains/2 and has_part/2) to retrieve all projects belonging to the specified section, and then collects the citations from those projects:

get_citations(Section,Citations) :-
    setof(project(Name),contains(Section,project(Name)),Projects),
    collect_citations(Projects,Citations).

collect_citations/2 uses the Web structure information (coded as precedes/2 facts) and the project names to search over the Web site for citation database module IDs. A union of these databases is accessed to obtain the citations for the projects.

The definition of collect_citations/2 is:

collect_citations(Projects,Citations) :-
    location_of(dept,DeptURL),
    setof(ModId,find_citmod(dept,DeptURL,Projects,ModId),ModIds),
    retrieve_citations(ModIds,Citations).

The call to location_of/2 retrieves the URL of the dept page. The URLs of the citation database modules are found by calling find_citmod/4 in a setof/3 call. The predicate retrieve_citations/2 retrieves citations from the union of the modules using code very much like that in §3.2.

The initial call to find_citmod/4 in collect_citations/2 searches from the dept page until a researcher page is found. The database module URL on that page is then extracted by extract_modURL/2. find_citmod/4 makes use of a predicate next_link/5, which uses heuristics to determine the next link to follow on any page.

The definition of find_citmod/4 is as follows:

find_citmod(researcher,PageURL,_,m_id(CitationsModURL)) :-
    extract_modURL(PageURL,CitationsModURL).
find_citmod(PageType,PageURL,Projects,CitationsModId) :-
    next_link(PageType,PageURL,Projects,NextPageType,NextPageURL),
    find_citmod(NextPageType,NextPageURL,Projects,CitationsModId).

next_link/5 examines the links on the current page (of type PageType and of URL PageURL) to determine the next page to visit, returning its type (i.e., NextPageType) and URL (i.e., NextPageURL). The choice of page is constrained by the list of project names in Projects, obtained earlier from contains/2.

The definition of next_link/5 is as follows:

next_link(PageType,PageURL,Projects,NextPageType,NextPageURL) :-
    precedes(PageType,NextPageType),
    m_id(PageURL)#>link(Label,NextPageURL),
    useful_link(Projects,NextPageType,Label,NextPageURL).

next_link/5 initially obtains the next page type of interest, by using precedes/2. It also retrieves a link from its current page by calling link/2. The project names, the next page type, the label of the link, and the new page URL, are used in useful_link/4 to evaluate if the link leads the search closer to a citation database module. If it doesn't, then backtracking will cause a reevaluation of link/2 to obtain another link.

One technique used in useful_link/4 is to analyse the link's label, which is a string, to see if the page it leads to is of type NextPageType. This is achieved by matching the words in the label string against particular cue words. For some page types, the cue words are stored in cue_word/2 facts. For instance, the word ``Research'' is a cue word for pages of type research:

cue_word("Research",research).

For other page types, the cue words may not be stored in facts. For example, the project names are used as cue words to judge links to project pages. Other heuristics useful for constraining search are described in [19].

extract_modURL/2 uses similar techniques to useful_link/4 to find the database module URL on the researcher's page.

4.5 Discussion

This example illustrates how the techniques of knowledge representation and automated Web search can be used to implement complex processing of queries on lightweight deductive databases. Logic programming rules are used to represent a concept hierarchy, and to specify search behaviour over Web pages. The former maps a section name to its component project names, while the latter handles changes to the contents of the Web pages.

The concept hierarchy has the drawback that it needs to be updated if a research section, group, or project, is added or removed. However, this can be done automatically, i.e. by parsing the information on the research page into logic programming facts. In fact, a script has been written which parses the page containing information on sections, groups, and projects at the University of Melbourne into logic programming facts.

The facts and rules of the previous subsections reside in a separate module at the Web site. This module is loaded first, and other pages and database modules are loaded during query evaluation.

5. Related Work

5.1 Database on Web Pages

Dobson et al [10, 11] utilise additional HTML tags to embed relational databases (called lightweight databases) in Web pages. Essentially, an entity-relationship diagram is mapped onto the hypertext structure of the Web. Relationships between entities on different pages are specified by hypertext links, with attributes defining the relationships. Lightweight databases have been used to generate HTML documents, for indexing, and for databases spreading over several servers.

Our work is motivated by their use of relational databases. However, we employ deductive databases to provide more powerful modelling capabilities. We can also specify module composition in logic programming rules.

Sandewall [23] proposes the World-Wide Data Base, where a database consists of downloadable short text files, each file containing an object description. An object consists of properties, represented in a specialised language, and can reference other database files (i.e., objects), or HTML pages. Other object-oriented concepts, such as message-passing, are not employed. The main application discussed is HTML page generation, where objects store resources for the generation of pages. In particular, values of properties may be scripts (in LISP) that specify how to generate HTML expressions.

Our databases are deductive, whereas theirs are object-based. However, their objects can be expressed in our language. Also, they do not provide a uniform query language for their objects database, while we use logic programming for that purpose.

Their use of one file per object may incur heavy network transmission costs. In our approach, modules can contain multiple relations.

5.2 Comparison with Deductive Database Systems

Our approach to implementing lightweight deductive databases uses Prolog with LogicWeb extensions. However, existing deductive database systems [22, 15] differ from Prolog systems in several ways, including:

Query optimisation. Query processing often finds all answers to the query, i.e. the ``set at a time'' paradigm is more efficient than the ``tuple at a time'' paradigm of Prolog. To facilitate this, optimised bottom-up evaluation is often used, rather than the top-down evaluation of Prolog systems.
Restrictions on rules. The rules in deductive database systems are range-restricted, i.e. all variables that appear in the head of the clause must appear in the body. This implies that all facts must be ground. This removes the need for full unification, thereby increasing efficiency.
Another common restriction is that all terms in the program are variables or constants. This ensures that logical entailment is decidable. A logic programming language with this restriction is Datalog [22].

A major difference between our lightweight deductive databases and existing deductive database systems is that our databases are not updateable. If updates were possible, transaction processing, and concurrency control would also have to be available. We have not yet considered updates because the main focus of this work is structuring information on the Web.

5.3 Knowledge-Based Access to Information

The Information Manifold [18, 17] is a system for building a knowledge base representing the user's interests. This uses a combination of Horn rules and the CLASSIC knowledge representation language to describe information sources, and taxonomy relationships among them. The knowledge base is also used to process the results of searches submitted to multiple Web index servers.

Our framework supports knowledge-based query processing. Users can build their own knowledge base and query processing rules, perhaps on top of those provided by information servers.

Web pages have been generated from knowledge bases by using user profiles [16], and user queries [13]. We can achieve a similar functionality when the results of lightweight deductive database queries are Web documents.

Barcaroli et al [2] represents hypertext at a Web site using a knowledge base. Their system answers user queries by returning a sequence of links leading to the page containing the answer. In our approach, the information provider can provide a similar capability by mapping user queries to appropriate URLs.

These approaches typically make use of knowledge representation languages based on description logics in order to represent concept models. We restrict ourselves to logic programming since it is a versatile paradigm, used in a range of AI problems (e.g., expert systems and knowledge representation), and in deductive databases [24, 22].

Context logic, an extension of first order logic where sentences are true with respect to a given context, has been used to integrate databases [14]. Axioms are written which lift sentences from several contexts into a common one. This is similar to defining a new relation in terms of relations in other modules by using context switching, as shown in §3.1.

5.4 Web-related Logic Programming Systems

PiLLoW [8] is an Internet/Web programming library for logic programming systems. It contains a predicate to fetch Web pages, and page headers, which returns the information in term form. However, remote program modules are loaded by supplying the module's URL in a call to use_module/1.

In contrast, LogicWeb views both ordinary pages, and pages containing Prolog code, as program modules. This means that the text, and any Prolog code a page contains, are utilised in a uniform manner by goal evaluation. Moreover, LogicWeb has module combination operators.

In PiLLoW, modules (identified by a new MIME type) loaded into a local Prolog engine can be used to process queries posed via browser forms. The page containing the form is in a separate file from the Prolog module processing the form's queries. The LogicWeb system provides similar functionality, but without using a new MIME type. Also, it allows the form and the Prolog code to be situated in the same module.

The ECLiPSe HTTP-Client library [4] has a predicate to access Web pages, returning a page as a string. The same predicate can be used to download Prolog code, which is then compiled on the local system. LogicWeb has similar functionality, but expressed in terms of program modules which hide the underlying retrieval and compilation phases.

PiLLoW, the ECLiPSe library, and LogicWeb, augment existing logic programming systems with Web operators. LogiMOO [25] employs a substantially different method to execute remote Prolog code locally, based on MOOs (Multi User Domains - Object Oriented) using multi-threaded distributed blackboards. A client can create a local virtual blackboard, and connect it to a MOO. Then, the client can take objects from, or put objects onto, the MOO via this blackboard. This allows code to be retrieved and processed, using Prolog meta-programming capabilities. Also, JavaScript ``open URL'' commands returned to the client as answers to Prolog queries causes Web pages to be displayed.

6. Conclusions

The Web should be enhanced with richer information content, and more sophisticated processing capabilities [3]. Our lightweight deductive databases provide improved querying, automated searching, and sophisticated processing and extraction of structured information on the Web. Lightweight deductive databases can be easily combined, and extended during query processing, using well-established techniques from the areas of deductive databases and structured logic programming.

We conclude with several possible avenues for future work.

We have not dealt with integrity constraints in this paper. For example, the uniqueness property of primary keys must be maintained when databases are combined. There exists methods for combining knowledge bases with integrity constraints [1].

We have demonstrated the utility of a small set of composition operations, but it may be useful to explore other operations.

A modelling language (with constructs closely related to the application domain) could be used to describe particular kinds of knowledge. This may take the form of additional syntax, which can be translated into Prolog.

The structured information provided by lightweight deductive databases should be more readable by intelligent software agents [12], than plain HTML text. An interesting possibility is to use lightweight deductive databases to store agent functionality, which an agent could upload as it searches the Web.

Lightweight deductive databases must be manually generated at present. However, in some cases, lightweight deductive databases can be generated from text. For example, if citations are sufficiently marked-up in HTML, they can be converted into facts automatically. This idea is used in §4, to generate the logic programming representation of the research information.

Approaches to automatically generating new lightweight deductive databases from existing ones, given database schemas, should be investigated. Work on agent-based knowledge extraction (or data mining) from databases using Inductive Logic Programming [9], might be applicable.

References

1: C. Baral, J. Minker, S. Kraus. Combining Multiple Knowledge Bases. IEEE Transactions on Knowledge and Data Engineering, 3(2):208 - 221, June 1991.
2: C. Barcaroli, L. Iocchi, M. Lenzerini, and D. Nardi. Knowledge-Based Acess to the Network. In Fifth International World-Wide Web Conference Workshop on "Artificial Intelligence-based tools to help WWW users", May 1996.
3: T. Berners-Lee, Robert Cailliau, Ari Luotonen, Henrik Frystyk Nielsen, and Arthur Secret. The World-Wide Web. Communications of the ACM, 37(8):76 - 82, August 1994.
4: Ph. Bonnet, S. Bressan, L. Leth, and B. Thomsen. Towards ECLiPSe Agents on the INTERNET, In P. Tarau, A. Davison, K. De Bosschere, and M. Hermenegildo, editors, Proceedings of the 1st Workshop on Logic Programming Tools for INTERNET Applications, JICSLP'96, Bonn, September 1996.
5: A. Brogi. Program Construction in Computational Logic. PhD thesis, Universita di Pisa-Genova-Udine, 1993.
6: A. Brogi, P. Mancarella, D. Pedreschi, and F. Turini. Composition Operators for Logic Theories. In J.W. Lloyd, editor, Computational Logic, symposium Proceedings, pages 117 - 134. Springer-Verlag, 1990.
7: M. Bugliesi, E. Lamma, and P. Mello. Modularity In Logic Programming. Journal of Logic Programming, pages 443 - 502, 1994.
8: D. Cabeza, M. Hermenegildo, and S. Varma. The PILLOW/CIAO Library for INTERNET/WWW Programming, In P. Tarau, A. Davison, K. De Bosschere, and M. Hermenegildo, editors, Proceedings of the 1st Workshop on Logic Programming Tools for INTERNET Applications, JICSLP'96, Bonn, September 1996.
9: W.H.E. Davies and P. Edwards. Agent-Based Knowledge Discovery.
10: S. Dobson and V. Burrill. Towards Improving Automation in the World-Wide Web.
11: S.A. Dobson and V.A. Burrill. Lightweight Databases. Proceedings of the 3rd International World-Wide Web Conference, April 1995.
12: O. Etzioni and D. Weld. Intelligent Agents on the Internet: Fact, Fiction, and Forecast. IEEE Expert, 10(4):44 - 49, August 1995.
13: J. Euzenat. Knowledge Bases as Web Page Backbones. In Fifth International World-Wide Web Conference Workshop on "Artificial Intelligence-based tools to help WWW users" , May 1996.
14: A. Farquhar, A. Dappert, R. Fikes, and W. Pratt. Integrating Information Sources Using Context Logic. In On-line Working Notes of the AAAI Spring Symposium Series on Information Gathering from Distributed, Heterogeneous Environments, January 1995.
15: J. Harland and K. Ramamohanarao. An Aditi Implementation of a Flights Database. Applications of Logic Databases, 1994.
16: T. Hoppe, C. Kindermann, O.K. Paulus, and R. Tolksdorf. The MIHMA Project: a Web Information Service Based on Dsscription Logics. In Fifth International World-Wide Web Conference Workshop on "Artificial Intelligence-based tools to help WWW users", May 1996.
17: T. Kirk. Knowledge Based Access to Information on the World-Wide Web. In Fifth International World-Wide Web Conference Workshop on "Artificial Intelligence-based tools to help WWW users" , May 1996.
18: T. Kirk, A.Y. Levy, Y. Sagiv, and D. Srivastava. The Information Manifold. In On-line Working Notes of the AAAI Spring Symposium Series on Information Gathering from Distributed, Heterogeneous Environments , January 1995.
19: S.W. Loke and A. Davison. Logic Programming with the World-Wide Web. In Proceedings of the 7th ACM Conference on Hypertext, pages 235 - 245. ACM Press, March 1996.
20: S.W. Loke, A. Davison, and L. Sterling. CiFi: An Agent for Citation Finding on the World-Wide Web. In To appear in the Proceedings of the 4th Pacific Rim International Conference on Artificial Intelligence. Also available as the technical report 96/4. 1996.
21: J. J. Moreno-Navarro. Tuple Inheritance: A New Kind of Inheritance for (Constraint) Logic Programming (Extended Abstract). Full paper is here. In G. Levi and M. Martelli, editors, Proceedings of the 12th International Conference on Logic Programming, page 829. MIT Press, 1995.
22: K. Ramamohanarao and J. Harland. An Introduction to Deductive Database Languages and Systems. VLDB Journal, 3(2):107 - 122, April 1994.
23: E. Sandewall. Towards a World-Wide Data Base. Proceedings of the 5th International World-Wide Web Conference, May 1996.
24: L. Sterling and E. Shapiro. The Art of Prolog. MIT Press, 1994.
25: P. Tarau and K. De Bosschere. Virtual World Brokerage with BinProlog and Netscape, In P. Tarau, A. Davison, K. De Bosschere, and M. Hermenegildo, editors, Proceedings of the 1st Workshop on Logic Programming Tools for INTERNET Applications, JICSLP'96, Bonn, September 1996.

Footnotes

...queries
These categories are borrowed from the field of deductive databases [22].

...Melbourne
The home page of the University of Melbourne Computer Science department is here.

...Melbourne
The URL of this page is http://www.cs.mu.oz.au/alt/research/research.html.

| |