Abstract:
The World Wide Web has quickly established itself as the standard accepted
medium for information supply, for example for (hyper-)texts,
images and databases. But what about the supply of standardized
knowledge bases in the WWW, which could constitute a leap
toward knowledge reuse?
After a promising initialization of the WWW publication of
KIF/Ontolingua knowledge bases (e.g.
[7])
by Thomas Gruber right at the beginning of the
expansion of the Web, little progress is observable here since he left the
Knowledge Systems Laboratory at Stanford University. Also the growth rate
of WWW knowledge which is represented (e.g. CGs, CycL, CommonKADS) or accessed
(e.g. KQML, April) in other proposed AI formats is not keeping
track with the general Web expansion rate. This may be due to the complex structure
and insufficient support of these formats - in comparison, e.g., to the use of the
SQL-database standard for HTTP/SQL-gateways:
Despite the distribution
of ANSI proposals for KIF and CGs it seems that even in these cases there is
not enough `investment security' for knowledge publications.
Could, in this situation, a (HTML-)version of Horn logic, the established
basis of logic programming, play the role of a declarative WWW representation
standard?
First of all, the relational databases, which are already available
in the Web could be principally viewed as Datalog special cases of such
Horn-logic knowledge bases, normally restricted to ground facts and
non-recursive rules - apart from problems in connection with null values
(database schemas will be discussed later on). Moreover the accumulated program libraries in pure Prolog (from Edinburgh to ISO) could be easily revised for Web
publication. Also, knowledge bases in modern logic languages such as
Prolog
or Gödel, or in functional-logic languages
[3] such as
Escher, Curry or Relfun could also be more or less easily reproduced
in Horn logic.
Finally, further translators between subsets of the above mentioned exchange
formats and Horn logic could be tackled: e.g. partial
KIF-Prolog translators have been written already; the Horn-logic reduction of sort
lattices will be
mentioned below. A desirable side effect would be that translation attempts
between concrete knowledge bases (via Horn logic) would increase the
comparability of new LP proposals. With the addition of
integrity constraints
[4]
Horn-logic knowledge bases can also be easily maintained interactively
[1].
From the
perspective of the user of relational databases this would be a transition to a
standard for deductive databases
[4]
as far as expressive power is concerned. Altogether, the number of
available Horn-logic WWW knowledge bases should most likely reach the critical
mass
that would cause automatically processed queries provide interesting results
to end-users.
Another important point arises from the Web distribution of knowledge bases.
The nowadays usual
fact-retrieval usage of databases in the Web considers a SQL database
as a black box on a given server, which can be queried via a form. In
contrast to this is the document-retrieval use of (hyper)texts via search
engines, which find appropriate
Web pages arbitrarily distributed on servers using key-word combinations. The
proposed Horn-logic knowledge bases in the WWW constitute an information category
between SQL databases and HTML documents. The following question arises: Is
it possible, on the one hand, to distribute this knowledge using the
paradigm of HTML documents and to access it, on the other hand, using the
paradigm of SQL-form queries or Prolog goals? In practice this would mean:
Web-distributed knowledge bases should not be queried by the (primitive,
since string-oriented) keywords of usual search engines; instead Prolog-like
goals should be used, perhaps via a form-interface, i.e. atoms, which
may contain logical (i.e. consistently bound) variables,
which, in the Prolog way, may be combined conjunctively (with a comma)
or disjunctively (with a semicolon), and which perhaps could also be negated
in a way suitable for the Web (global negation as failure
would make little sense in the `open Web world'; see discussion below).
Apart from the increased expressive power, compared to SQL, through variables
and structures in clauses (facts and rules), such knowledge bases
inherit an advantage from HTML documents - the possibility of seeing
a found answer to the current query (atom or string) in the context of
neighboring potential answers (atoms or strings), i.e. to associatively browse
(in knowledge bases or documents) instead of only communicating with a
deductive black box:
As an optional `explanation' of answer bindings it would be possible to
highlight both the atomic
facts to which a goal is reduced and
the rules used to derive it, either
within the knowledge bases or - more focused - the predicate definitions
containing them.
Such use of Web-distributed knowledge could theoretically be implemented using key-word searches (on a lower level also atoms are strings). In practice the realization is an interesting challenge for logic programming.
Below follows a first sketch of a possible search engine for knowledge bases (KBs) Horn-published in the Web, henceforth called a KB-search engine.
This would be
useful even if it would just take better into account the formal (Horn-logic) syntax of
knowledge-base entries than present text-search engines do. But here initial
semantic and inferential abilities will be added.
First, for typing each knowledge base, it would be sensible to
give a KB label (of or in) its HTML page, e.g. already in the
header,
to consider only relevant, i.e. KB pages while doing a search (the other way
round, the normal text-search engines could ignore KB areas).
A variant, at least for Datalog, is the tag table of HTML 3.0,
with which predicate definitions can be presented as tables
(but a transition to the more general SGML might be advantageous also for this
Web use, exploiting generic identifiers for KBs, predicates, clauses,
and below).
In any case
it is useful, analogously to a SQL-database schema, to declare the signatures
of n-ary predicates in order to establish the meaning of the n arguments. The
sorts usable here can themselves be defined as unary predicates in sort
lattices (e.g. via single-premise rules or binary subsumes-relationships between the predicates).
Now, the KB-search engine, before using a found predicate definition,
first tests its signature compatability wrt the perhaps itself
sorted goal atom. This
semantic prefiltering avoids elementary predicate ambiguities, such
as those known from the corresponding tiresome key-word ambiguities
of todays text-search engines. In this way it is in principle possible to
attain a higher precision using formal KBs than using natural-language
texts. Modeled on the joint FAQ elaboration for newsgroups, expert groups would be able, again with the help of the Internet, to
agree on their special terminology and their sort sublattices, with interfaces
to related
knowledge areas. Gradually it would thus be possible to reduce the
ambiguity of predicate names.
The inferential kernel problem of a KB-search engine in the Web
corresponds to the problem of OR-parallel architectures such
as Aurora
[12],
whose resources have to be balanced between local search
(normally sequential) and global
search (parallel, communication-intensive). The
KB-search engine has to suitably interleave resolution on a (found)
KB page with the Web supply of (further) KB pages. Here is also a connection
to the theme of
modular/contextual LP
[5]:
When should a goal be proven locally in the
actual KB page (module),
when should be globally proceeded to other KB pages (modules)? When should
backtracking cross page (module-)boundaries? While a
page-local sequence of resolution steps is useful mainly for recursive rules,
the global search
process has to continuously spread in parallel over the KB pages (e.g.
with algorithms such as fish-search
[13]).
Local non-terminations then cannot endanger the global success of a
query: Even if it could be presupposed that all knowledge bases published
in the Web would be terminating individually, there could still be
non-terminations caused by the interaction of distributed knowledge bases, e.g.
via two distributed co-recursive rules. Since it is difficult to
capture the (model theoretic) semantics of a predicate definition even
with signature declarations if its clauses are distributed over
several knowledge bases, it will be beneficial to agree on more or
less strong restrictions here. The strongest locality principle
would completely forbid predicate distribution, a middle one would only allow
the distribution of facts, and the weakest would also tolerate
distributed non-recursive rules. Furthermore the important question
arises whether, for uniformity of the semantics, also the clauses
within a KB page should be interpreted according to pure Horn logic in an
OR-parallel (breadth-oriented) manner instead of the
Prolog-like use of backtracking.
In any case one would rather expect solutions to surface Lycos-type, in groups,
than Prolog-type, by individual more requests; their order could also
reflect certainty factors, starting from the facts used. Due to the openess
and dynamics
of the WWW the issue of logical `completeness' of solution sets would
hardly arise:
The KB-search engine even has to cut off the search
processes which are still running (possibly non-terminating), if no (further)
solution group is requested; a SQL-type (total-)set orientation would thus not
be sensible here.
Finally, it is necessary to examine how the indexing techniques of
text-search engines, databases, and Prolog can be transferred to the
KB-search engine. Within a KB page the signature declarations preceding the
predicate definitions (`procedures') can play the role of
indexed HTML headings. A finer KB indexing down to the level of single
clauses would be conceivable analoguously to a complete text indexing, with
advantages and disadvantages similar to those of Alta Vista.
The KB-search engine outlined above mainly works interpretively on KB sources, which also can be used with normal browsers. This permits, apart from black-box queries, to also directly inspect and configure knowledge bases much like text building blocks. Only by offering this direct manipulation of (Horn-logic) standardized knowledge do knowledge bases have a certain chance to become common property in the way of EXCEL tables: For this not only conducting knowledge-base queries (spread-sheet calculations) is important, but also helping with the construction, change, combination and bidirectional text coupling of databases (spread sheets). This could be supported by the hypertext mechanisms of the Web. For instance, the submodules of a knowledge base would have to be accessible through the KB-search engine as well as interactively via a special HTML link. If, due to efficiency considerations, the compilation of a Horn-logic knowledge base becomes necessary, it could be tried to simulate direct manipulation, e.g. via back-references to the source and automatic recompiliation after changes. In contrast to an even more efficient native-code compilation, a WAM compilation would preserve the advantage of pseudo-code transmission and client-side emulator execution first used by Java.
Due to the comparatively simple formalizabilty of technical knowledge,
it is sensible to first construct Horn-logic knowledge bases for this area.
In the VEGA project we have done this mainly for subareas of material
science (e.g.
[2]).
However, knowledge bases such as RTPLAST are
published mainly as pure ASCII files (in Relfun syntax) in the Web. It is
planned to include more Horn-logic knowledge bases, e.g. to compare
functional-logic programs. However it is still open which operational
variant of Horn logic is the most suitable for the publication of
practical knowledge bases in the Web: Experience would encourage to
exploit the textual order of clauses (OR sequentiality) and their premises
(AND sequentiality), while the distributed Web search suggests an at least
OR-parallel interpretation. After the decisions necessary here, mainly
a more exact specification of the HTML additions for Horn-logic knowledge
bases and a revision of the search engine operating on them has to be completed before a prototype implementation (e.g. in Java) can be begun.
For reasons of acceptance, the efficiency of the KB-search engine
plays a major role of course; it could be increased both by progress in LP (e.g.
better OR-parallel strategies) and in WWW technology (e.g. better efficiency of
Java).
Interesting possibilities for WWW knowledge bases are also offered, under the heading intranets, by the trend towards organization-internal information supply on the basis of Web technology. In analogy to a central corporate memory [8] the intranet knowledge bases of a networked organization (via LAN or WAN) would become the knowledge component of a distributed corporate memory. The information deficits in large organizations, which cost time and money, show that even for intranet knowledge bases it would be useful to have a KB-search engine, which because of the smaller knowledge volumes would be able to work faster here than in the open Internet. If organizations use the externally agreed upon (Horn-logic) standard also as their internal representation format, then they can move knowledge bases back and forth between their intranet and the public Internet as needed: Inward to take over or adapt new know-how; outward for demos and perhaps (cryptographical) sales (using an Internet payment standard) of their expertise. Analogously, a unified representation will permit flexible alliances in knowledge-intensive industries.
Besides the technical issues discussed in the body of this paper, the building up of
knowledge bases in the World Wide Web has a strong `community' component:
While interest in providing and using Inter- or intranet KBs is growing,
we need to converge on a standard representation format.
Since Horn logic already is the language of choice for many non-Web
KB publications and at the core of many LP extensions, it looks like the
right place to start with. If Web-KB providers use Prolog syntax,
their KBs will be translatable into the ultimate
format indexed and retrieved by the envisaged Horn-logic KB-search engine,
since there already exists a sufficient number of `legacy' Prolog KBs
also requiring such a translator.
When attempting to build reusable
Web Horn KBs, the harder part will probably be common
`ontological' decisions concerning compatible definitions of predicate names.
However, the sort hierarchies needed for structuring non-trivial domains
can, for uniformity, be regarded as unary predicates,
again defined by (simple) Horn rules.
Once a search engine hits enough clauses on typical goals, a
self-accelerating effect in the growth of Web KBs may start,
similar to the ones we are witnessing for other information categories
in the Web.
This paper has attempted to show that the LP community is well-prepared to
venture on such a major Web effort.
Among the approaches to create a Web infrastructure for reusing
logic programs or KBs, the present proposal comes closest to the one
independently developed in
[11]:
``Lightweight deductive databases'' are deductive databases incorporated
into Web pages to provide a source of distributed structured information.
Their principal features, distributed maintenance, extensibility,
and reusability, as well as notions
from modular LP are common with our proposal.
However, like other search engines, we stick to server-side processing
as the default. Additionally, we stress purity in the sense
that no extra operators, e.g. for invoking goals in other modules, are used.
Instead, a pure Horn-logic KB should be loadable unchanged into a
KB-tagged Web page, where the KB-search engine deals with the
modular structure of pages much like textual search engines do.
Purity also supports OR-parallelism as a paradigm to exploit
Web-distributed KBs, implementable by `agent replication' such as in the
fish-search algorithm.
Finally, besides deducing answer bindings, we feel it is important that
a KB-search engine can also `explain' solutions by highlighting
clauses used for a deduction in the context of their KB pages.
Altogether we do not propose a new LP language for the Web but the pragmatic
reuse of Horn-logic KBs - in a Prolog-like syntax - by a new search engine.
Once some experience has been accumulated with Web-searching such KBs,
Web-oriented - but still pure - extensions of the Horn-logic kernel language
could be carefully introduced.
But before going into the nitty-gritty details - and quarrels -
of defining a `Web LP' language, we should experiment with one or
more search engines for a fixed (Horn-)language.
[2] H. Boley, U. Buhrmann, and Chr. Kremer. Towards a Sharable
Knowledge Base on Recyclable Plastics. In: J. K. McDowell and K. J. Meltsner,
Eds., Knowledge-Based Applications in Materials Science and Engineering, pp. 29-42, TMS, 1994.
http://www.dfki.uni-kl.de/~vega/relfun.html
(Examples: RTPLAST)
[3] H. Boley. Extended Logic-plus-Functional Programming. In: L.-H. Eriksson, L. Hallnäs, and P. Schroeder-Heister, Eds., Proc. of the 2nd International Workshop on Extensions of Logic Programming - ELP '91, Springer LNAI 596, 1992.
[4] F. Bry, R. Manthey, and H. Schütz. Deduktive Datenbanken. In: N. E. Fuchs, Ed., KI - Themenheft Logische Programmierung, 3/96, Interdata Verlags GmbH, Sept. 1996.
[5] M. Bugliesi, E. Lamma, and P. Mello. Modularity in Logic Programming. Journal of Logic Programming, 19/20, pp. 443-502, 1994.
[6] D. Cabeza and M. Hermenegildo. html.pl: A Simple HTML Package for Prolog and CLP Systems - Description and User's Manual (Version 96.1.1). Computer Science Department, Technical University of Madrid (UPM), March 1996.
[7] Th. R. Gruber and G. R. Olsen. An Ontology for Engineering
Mathematics. In: J. Doyle, P. Torasso, and E. Sandewall, Eds., Fourth
International Conference on Principles of Knowledge Representation and
Reasoning, Gustav Stresemann Institut, Bonn, Germany, Morgan Kaufmann, 1994.
http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/index.html
(Reference Documents: The Engineering Math Ontologies)
[8] K. Hinkelmann and O. Kühn. Revising and Updating a Corporate Memory. In: Proc. of the European Symposium on Validation and Verification of Knowledge-based Systems - EUROVAV-95, June 1995.
[9] R. A. Kowalski. Logic Programming in Artificial Intelligence. 12th International Joint Conference on Artificial Intelligence - IJCAI'91, pp. 596-603, 1991.
[10] S. W. Loke and A. Davison. Logic Programming with the World-Wide Web, Proc. of the 7th ACM Conference on Hypertext - Hypertext'96, ACM Press, March 1996.
[11] S. W. Loke, A. Davison, and L. Sterling. Lightweight Deductive Databases on the World-Wide Web, P. Tarau, A. Davison, K. De Bosschere, and M. Hermenegildo, Eds., Proc. of the 1st Workshop on Logic Programming Tools for INTERNET Applications, JICSLP'96, Bonn, Sept. 1996.
[12] E. Lusk, D. H. D. Warren, S. Haridi, et al. The Aurora Or-Parallel Prolog System. In: International Conference on Fifth Generation Computer Systems 1988, ICOT, Tokyo, Japan, Nov. 1988.
[13] R. D. J. Post and P. M. E. De Bra. Information Retrieval in the World Wide Web: Making Client-based Searching Feasible. In: Proc. of the First International WWW Conference, Geneva, May 1994.