background="/images/LogAlg_bg.gif" TEXT="000000" FONTSIZE="14pt" FONT="helvetica"

 Computational Logic Abstract Interpretation of Logic Programs

## Introduction

[Material partly from Cousot, Nielson, Gallagher, Sondergaard, Bruynooghe, and others]

• Many CS problems related to program analysis / synthesis

• Prove that some property holds for program
(program analysis)

• Alternatively: derive properties which do hold for program
(program analysis)

• Given a program , generate a program which is
• in some way equivalent to
• behaves better than w.r.t. some criteria
(program analysis / synthesis)

• Standard Approach:
• identify that some invariant holds, and
• specialize the program for the particular case

## Program Analysis

• Frequent in compilers although seldom treated in a formal way:
• code optimization'',
• dead code elimination'',
• code motion'',
• ...
[Aho, Ullman 77]

• Often referred to as dataflow analysis''

• Abstract interpretation provides a formal framework
for developing program analysis tools

• Analysis phase + synthesis phase
Abstract Interpretation + Program Transformation

## What is abstract interpretation?

• Consider detecting that one branch will not be taken in:

if then else
• Exhaustive analysis in the standard domain: non-termination

• Human reasoning about programs - uses abstractions or approximations:
signs, order of magnitude, odd/even, ...

• Basic Idea: use approximate (generally finite) representations of computational objects to make the problem of program dataflow analysis tractable

• Abstract interpretation is a formalization of this idea:
• define a non-standard semantics which can approximate the meaning
or behaviour of the program in a finite way

• expressions are computed over an approximate (abstract) domain rather than the concrete domain (i.e., meaning of operators has to be reconsidered w.r.t. this new domain)

## Comparison to other methods

• Very general:
can be applied to any language with well defined (procedural or declarative) semantics
• Automatic - (vs. proof methods)
• Static - not all possible runs actually tried (vs. model checking)
• Sound - no possible run omitted (vs. debugging)

## Example: integer sign arithmetic

• Consider the domain (integers)

• and the multiplication operator:

• We define an abstract domain'':

• Abstract multiplication: defined by

• This allows us to reason, for example, that is never negative

• Some observations:
• The basis is that whenever we have then:
if are approximated by
then is approximated by

• It is important to formalize this notion of approximation,
in order to be able to prove an analysis correct

• Approximate computation is generally less precise but faster (tradeoff)

## Example: integer sign arithmetic (Contd.)

• Again, (integers)

• and:

• Let's define a more refined abstract domain'':

• Abstract multiplication: defined by

• This now allows us to reason that is zero

• Some observations:
• There is a degree of freedom in defining different abstract operators and domains

• The minimal requirement is that they be safe'' or correct''

• Different safe'' definitions result in different kinds of analyses

## Example: integer sign arithmetic (Contd.)

• Again (integers)

• We cannot use because we wouldn't know how to represent the result of
(i.e. our abstract addition would not be closed)

• New element '' (supremum): approximation of any integer

• New abstract domain'':

... (alt: )

• We can now reason that is never negative

## Important observations

• In addition to the imprecision due to the coarseness of , the abstract versions of the operations (dependent on may introduce further imprecision

• Thus, the choice of abstract domain and the definition of the abstract operators are crucial

## Issues in Abstract Interpretation

• Required:
• Correctness - safe approximations: because most interesting'' properties are undecidable the analysis necessarily has to be approximate. We want to ensure that the analysis is conservative'' and errs on the safe side''

• Termination - compilation should definitely terminate
(note: not always the case in every day program analysis tools!)

• Desirable - practicality'':
• Efficiency - in practice finite analysis time is not enough: finite and small

• Accuracy - of the collected information: depends on the appropriateness of the abstract domain and the level of detail to which the interpretation procedure mimics the semantics of the language

• Usefulness'' - determines which information is worth collecting

• The first two received the most attention initially (understandably)

• Last three recently studied empirically (e.g., for logic programs)

## Safe Approximations

• Basic idea in approximation: for some property we want to show that

Alternative: construct a set , and prove

then, is a safe approximation of

• Approximation on functions: for some property we want to show that

• A function

is a safe approximation of if

## Approximation of the meaning of a program

• Let the meaning of a program be a mapping from input to output, input and output values standard'' domain :

• Let's lift' this meaning to map sets of inputs to sets of outputs

where denotes the powerset of S, and

• A function

is a safe approximation of if

• Properties can be proved using instead of

## Approximation of the meaning of a program (Contd.)

• For some property we want to show that
for some inputs ,

• We show that
for some inputs ,

• Since
for some inputs ,
(Note: abuse of notation - does not work on abstract values )

• As long as is monotonic:

• And since , then:
for some inputs ,

## Abstract Domain and Concretization Function

• The domain can be represented by an abstract'' domain of finite representations of (possibly) infinite objects in

• The representation of by is expressed by a (monotonic) function called a concretization function:

such that if is the largest element (under ) of that describes
[ is obviously a complete lattice ]

e.g. in the signs'' example, with , is given by




• we define

## Abstraction Function

• We can also define (not strictly needed) a (monotonic) abstraction function

if is the least'' element of that describes
[ under a suitable ordering defined on the elements of ]

e.g. in the signs'' example,

(and not )
(and not )



## Abstract Meaning and Safety

• We can now define an abstract meaning function as

which is then safe if

• We can then prove a property of the output of a given class of inputs represented by by proving that all elements of have such property

• E.g. in our example, a property such as if this program takes a positive number it will produce a negative number as output'' can be proved

## Proving properties in the abstract

• Generating :
• obtained from program and predefined semantics of operators
,

• Automatic analysis:
should be obtainable from program and semantics of abstract operators (compositional properties)

• If this program takes a positive number it will produce a negative number as output''
• , input, output

## Collecting Semantics

• Input-output'' semantics often too coarse for useful analysis: information about state'' at program points generally required extended semantics''

• Program points can be reached many times, from different points, and in different states'' collecting'' (sticky'') semantics

• Analysis often computes a collection of abstract states for a program point

• Often more efficient to summarize'' states into one which gives the best overall description lattice structure in abstract domain

## Lattice Structure

• The ordering on , , induces an ordering on , (approximates better'')

E.g., we can choose either or ,
but and , and
since we have , i.e., approximates better than ,
it is more precise

• It is generally required that be a complete lattice

• Therefore, for all there exists a unique least upper bound -i.e., such that

• Intuition: given a set of approximations of the current state'' at a given point in a program, to ensure that it is the best overall'' description for the point:
• approximates everything the elements of approximate
• is the best approximation in

## Example: integer sign arithmetic

• We consider

• We add (infimum) so that exists and to have a complete lattice:

• (Intuition:
it represents a program point that is never reached)

• The concretization function has to be extended with




• The lattice is then given by:

## Example: integer sign arithmetic (Contd.)

• To make more meaningful we consider

• The lattice is then given by: ?

• accurately represents a program point where a variable can be negative or zero

## The Galois Insertion Approach

• In the following, we will refer to simply as

• (Collecting) program semantics is often given as (the least s.t. , being the program-dependent semantic function on )

• Thus, we need to relate this fixpoint to (that of) the approximate semantic function (which approximates and operates on elements of an abstract domain )

• Assume: and are complete lattices; and are monotonic functions. The structure is called a Galois Insertion if:

• Safe approximation, defined now in terms of a Galois insertion:
Let a Galois insertion , safely approximates iff

• Fundamental Theorem [Cousot]: Given a Galois insertion , and two (monotonic) functions and then if approximates , approximates

## Termination: conditions on and

• The question is whether is finitely computable

• The abstract operator operates on elements of an abstract domain ,
which we have required to be a complete lattice,
and is monotonic, therefore

for some which we would like to be finite
(i.e. we would like the Kleene sequence to be finite)

• Recalling the characteristics of fixpoints on lattices, the Kleene sequence will be finite in cases including:
• is finite

• is ascending chain finite

## Termination: Discussion

• Showing monotonicity of may be more difficult than showing that meets the finiteness conditions

• There may be an which terminates even if the conditions are not met

• Conditions also be relaxed by restricting the class of programs (e.g. non-recursive programs pose few difficulties, although they are hardly interesting)

• In some cases an approximation from above ( ) can also be interesting

• There are other alternatives to finiteness: dynamic bounded depth, etc.
(See: Widening and Narrowing)

## Origins (General Programming)

• The idea itself (i.e. rule of signs) predates computation...

• The idea of computing by approximations was used as early as 1963 by Naur
(pseudo evaluation'', in the Gier Algol compiler),
a process which combines the operators and operands of the source text in the manner in which an actual evaluation would have to do it, but which operates on descriptions of the operands, not on their values''

• 1972, Sintzoff (proving well-formedness and termination properties)

• 1975, Wegbreit appears to be the first to develop a lattice-theoretic model

• Mid 70's: Kam, Kindall, Tarjan, Ullman, ...

• 1976,77, Patrick and Radhia Cousot proposed a formal model for the analysis of imperative (flowchart'') languages: unifying framework
• Define a static'' semantics: associate a set of possible storage states with each program point
• Dataflow analysis constructed then as a finitely computable approximation to the static semantics

## Analyzing Logic Programs

• Which semantics?
• Declarative semantics: concerned with what is a consequence of the program
• Model-theoretic semantics
• Fixpoint ( operator-based) semantics
can be what the program actually does (cf. database-style bottom-up evaluation)

• Operational semantics: close to the behavior of the program
• SLD-resolution based (success sets)
• Denotational
• Can cover possibilities other that SLD: reactive, parallel, ...

• Analyses based on declarative semantics are often called bottom up'' analyses

• Analysis based on the (top-down) operational semantics are often called top-down'' analyses

• Also, intermediate cases (generally achieved through program transformation)

## Case Study: Fixpoint Semantics

• Given the first-order language associated with a given program , the Herbrand universe () is the set of all ground terms of .

• The Herbrand Base () is the set of all ground atoms of .

• A Herbrand Interpretation is a subset of .
is the set of all Herbrand interpretations ().

• A Herbrand Model is a Herbrand interpretation which contains all logical consequences of the program.

• The Immediate Consequence Operator () is a mapping defined by:

(in particular, if , then , for every ).

• is monotonic, so it has a least fixpoint which can be obtained as starting from the bottom element of the lattice (the empty interpretation, ).

• (Characterization Theorem) [Van Emden and Kowalski]:
The Least Herbrand Model of , is

## Fixpoint Semantics: Example

• Example:


}


all subsets of

## Bottom-up'' Abstract Interpretation

• Finds an approximation of by approximating

• We apply abstract interpretation:
• Domain: , s.t. elements of approximate elements of .

• Concretization function:

• Abstraction function:

• Operator abstraction: abstract version of the operator

• Correctness:
• should be a Galois insertion, i.e. complete lattice and it should approximate :

• safe approximation of , i.e.

• Termination:
• monotonic.
• (at least) ascending chain finite.

• Then, will be obtained in a finite number of steps
and will approximate .

## Bottom-up'' Abstract Interpretation (Contd.)

Such bottom-up'' analyses have been proposed for example by Marriott and Sondergaard, and, more recently, by Codish, Dams, and Yardeni, Debray and Ramakrishnan, Barbuti, Giacobazzi, and Levi, and others.

## Example: simple type'' inference

• Minimal type inferencing'' problem [Sondergaard]:
Approximating which predicates are in

• : denotes the predicate symbol for an atom

• (set of predicate symbols in a program )
Then , we call it

• Concretization function:

• Abstraction function:

• is a Galois insertion

## Example: simple type'' inference (Contd.)

• Abstract version of (after some simplification):


and


• finite (finite number of predicate symbols in program) and monotonic

analysis will terminate in a finite number of steps and
approximates .

## Example: simple type'' inference (Contd.)

• Example:


} 		 }

• Abstraction:

• Concretization:




• Analysis:

## -based Bottom-up Analysis: Discussion

• Simple and elegant. Based on the declarative, fixpoint semantics
• General: results independent of the query form

• Information only about procedure exit.'' Normally information needed at various program points in compilation, e.g., call patterns'' (closures)

• The logical variable'' not observed (uses ground data). Information on instantiation state, substitutions, etc. often needed in compilation

• Not query-directed: analyzes whole program, not the part (and modes) that correspond to normal'' use (expressed through a query form)

## -based Bottom-up Analysis: Discussion (II)

• Solutions:

• Call patterns obtainable via magic sets'' transformation
[Marriott and Sondergaard]

Used also for query-directed analysis by [Barbuti et al.], [Codish et al.], [Gallagher et al.], [Ramakrishnan et al.], and others

• Enhanced fixpoint semantics
(e.g, S-semantics [Falaschi et al.], [Gaifman and Shapiro])

## Top-down'' analysis (summarized)

• Define an extended (collecting) concrete semantics, derived from SLD resolution,
making relevant information observable.

• Abstract domain: generally abstract substitutions''.

• Abstract operations: unification, composition, projection, extension, ...

• Abstract semantic function: takes a query form (abstraction of initial goal or set of initial goals) and the program and returns abstract descriptions of the substitutions at relevant program points.

• Variables complicate things:
• correctness (due to aliasing),
• termination (merging information related to different renamings of a variable)

• Logic variables are in fact (well behaved) pointers:
X = tree(N,L,R), L = nil, Y = N, Y = 3, ...
this makes analysis of logic programs very interesting
(and quite relevant to other paradigms).

## Domains

• Simple domains [Mellish,Debray], e.g.:
{ closed (ground), don't know, empty, free, non-var }
(e.g. , ?, , , )

• May need to be very imprecise to be correct:
:- entry p(X,Y) : ( free(X), free(Y) ).
p(X,Y) :-
q(X,Y),
X = a.
q(Z,Z).


• Correct/more accurate treatment of aliasing [Debray]:
associate with a program variable a pair
abstraction of the set of terms the variable may be bound to
set of program variables it may share'' with .

## Domains: Pair Sharing

• More accurate sharing - pair sharing [Sondergaard] [Codish]:
pairs of variables denoting possible sharing.
:- entry p(X,Y) : ( free(X), free(Y) ).
p(X,Y) :-
q(X,Y), % { X=f, Y=f } and { (X,Y) }
X = a.  % { X=g, Y=g } and { (X,Y) }
q(Z,Z).


• Note: we have used a combined'' domain: simple modes plus pair sharing

• Pair sharing can encode linearity:
:- entry p(X,Y) : ( free(X), free(Y) ).
p(X,Y) :-
q(X,Y),      % { X=f, Y=f } and { (X,Y) }
W = f(X,Y).  % { W=nv, X=f, Y=f } and { (W,W), (X,Y) }
q(Z,Z).


## Domains: Set Sharing

• Even more accurate sharing - set sharing [Jacobs et al.] [Muthukumar et al.]:
sets of sets of variables.

• A bit tricky to understand. Try:

• Encodes grounding and independence
• has no ocurrence in any set: it is ground
• has no ocurrence in any set: they are independent

## Other domains

• Sharing+Freeness [Muthukumar et al.] (and + depth-K)
• Type graphs [Janssens et al.]
• Depth-K [Sato and Tamaki]
• Pattern structure [Van Hentenryck et al.]
• Variable dereferencing [VanRoy] [Taylor]
• ...

• Much work by [Codish et al.] [File et al.] [Giacobazzi et al.] ... on combining and comparing these domains

## Frameworks

• Debray: predicate level mode inference (call and success patterns for predicates). Unification reformulated as entry + exit unification. Termination by tabling.

• Jones, Marriott, and Sondergaard: using denotational semantics.

• Bruynooghe:
• Concrete semantics constructs generalized'' AND trees: nodes contain instance of goal before and after execution: call substitution and success substitution.

• Analysis constructs abstract AND-OR trees''. Each represents a (possibly infinite) set of (possibly infinite) concrete trees. Widening to regular trees for termination.

• Framework is generic: parametric on some basic domain related functions + conditions for correctness and termination.

• Muthukumar and Hermenegildo: PLAI'' framework.
Improvement over previous frameworks: Efficient fixpoint algorithms (dependency tracking) and memory savings (no explicit representation of trees).

## Abstract AND-OR Tree

• Tree exploration: ?- p. h:- p1,...,pn.

figure=/home/clip/Slides/nmsu_lectures/ai/Figs/illust.ps,bbllx=0pt,bblly=20pt,bburx=500pt,bbury=180pt,width=0.85

• Basic operations:
• Procedure entry: from obtain
• Entry-to-exit (b): from obtain
• Clause entry: from obtain (and clause exit)
• Body traversal: from obtain (iteratively applying (a))
• Procedure exit: from (each or all of the) obtain

## Fixpoint Optimization

• Fixpoint required on recursive predicates only:

figure=/home/clip/Slides/nmsu_lectures/ai/Figs/fixpt.ps,bbllx=0pt,bblly=20pt,bburx=500pt,bbury=220pt,width=0.85

• Simply recursive (a)
• Mutually recursive (b)

Use current success substitution and iterate until a fixpoint is reached''

## Other Improvements

• Abstract tree contains several occurrences of the same atom in a clause (for precision): useful for program specialization
( Multivariance )

However, too many versions if not controlled
(solutions proposed [Gianotti et al.], [Jacobs et al.], [Puebla et al.])

• Much recent work in domains, improvement of fixpoints, application, etc. [Taylor],[VanRoy], GAIA [LeCharlier et al.]

• Abstract compilation:
Compute over and abstract version'' of the program

• Reexecution [Bruynooghe, LeCharlier et. al.]
(alternative to keeping track of accurate sharing)

• Caching of operations [LeCharlier et al.]

## Analysis of Constraint Logic Programs

• CLP: (relation-based) programs over symbolic and non symbolic domains: constraint satisfaction instead unification (e.g. CLP(R), PrologIII, CHIP, etc.)

• Jorgensen, Marriott, and Michaylov [ISLP'91] and later Marriott and Stuckey [POPL'93] identified numerous opportunities for improvement via static analysis

• A number of proposals for analysis frameworks:
• Marriott and Sondergaard [NACLP90]:
denotational approach
• Codognet and Filé [ICPL92]:
uses constraint solving for the analysis itself and abstract compilation''
• G. de la Banda and Hermenegildo [WICLP'91,ILPS'93]:

## Analysis of Constraint Logic Programs (Contd.)

• Example: Definiteness analysis (Def) [G. de la Banda et al.]
Domain:




• Other analyses:
• Freeness analysis [Dumortier et al.] and combinations.
• LSign [Marriott, Sondergaard and Stuckey, ILPS'94]

• Applications:
• optimization [Keely et al., CP'96]
• parallelization [Bueno et al., PLILP'96]
• ...

## Origins (Declarative Paradigms, to CLP)

• A few milestones (on the road to CLP analysis):
• 1981, Mycroft: strictness analysis of applicative languages

• 1981, Mellish: proposes application to logic programs

• 1986, Debray: framework with safe treatment of logic variables, discussion of efficiency

• 1987, Bruynooghe: framework for LP based on and-or trees

• 1987, Jones and Sondergaard: framework based on a denotational definition of SLD

• 1988, Warren, Debray and Hermenegildo: and practicality of Abs. Int. for Logic Programs shown (for program parallelization)

• 1989, Muthukumar and Hermenegildo: PLAI generic system

• 1990, Van Roy / Taylor: application to sequential optimization of Prolog

• 1991, Marriott et al.: first extension to CLP

• 1992, Garcia de la Banda and Hermenegildo: generalization of Bruynooghe's algorithm to CLP, extension of PLAI

## Conclusions

• Abstract Interpretation is a very elegant program analysis technique

• It has in addition been proved useful and efficient. E.g., for LP and CLP:
• Static parallelization of logic (and CLP) programs [Hermenegildo et al]
• (Sequential) program optimization [Taylor, VanRoy, ...]
• Optimization of CLP programs [Marriott et al, ...]
• Abstract debugging, etc.

## Conclusions (and Coda!)

• Interesting issues studied for handling large real programs:
• Modularity
• Handling extra-logical features, higher order
• Handling dynamic code
• Support of test-debug cycle
Solutions include [See, e.g., papers in ESOP'96, SAS'96]:
• Module interface definition: modular analysis
• Analysis of `Full Prolog''
• Incremental analysis

• Demo!

Last modification: Wed Nov 22 23:57:35 CET 2006 <webmaster@clip.dia.fi.upm.es>