Parallelization Tools

Next: Sequentialization Tools Up: Achievements and Misses Previous: Integration and Assessment

Parallelization Tools

In the context of the partitioning tools, UPM has finalized the design and implementation of the tools which parallelize based on the notion of strict independence. A new tool architecture has been designed and used in this new implementation, which achieves great modularity among the different components (basic global analyzer, analysis domains, parallelization algorithms, annotation simplification, etc.). As a result, it has been possible to incorporate a large number of well-known approximation domains and combinations thereof to the &-Prolog/Ciao global analyzer. A recently implemented domain is the ``depth-K sharing'' domain developed at Southampton. It has also been possible to include the three different parallelization (annotation) algorithms discussed in previous periods to the &-Prolog/Ciao compiler. Once the implementation was completed, the efficiency, accuracy, and effectiveness of the different domains for global analysis and annotation algorithms have been exhaustively evaluated [15, 12]. This study assesses the importance of abstract interpretation in automatic parallelization, and also sheds new light on the analysis domains, and clarifies their relationship to the parallelization process itself. While more experiments are still needed, the results so far imply that automatic program parallelization using global analysis based on a formal technique (abstract interpretation) is indeed practical both in terms of cost and results: analysis times are reasonable and good speedups are observed in a significant number of programs. In addition to the study of the performance of the different domains, the performance of the different parallelization algorithms was also assessed [13].

Also, the different components of the tools have been made independent to a great extent from the notion of independence among processes to be used during the parallelization. As a result, it has been possible to implement the techniques and algorithms developed last year at UPM for detecting non-strict independent and-parallelism at compile-time, and to integrate such implementation smoothly in the &-Prolog/Ciao parallelizing compiler and system. This has resulted, to our knowledge, in the first tool capable of parallelizing programs which only exhibit non-strict independence. In the first version of this tool, only unconditional parallelism is detected and exploited (i.e. no run-time checks are generated). We are currently extending the implementation in order to support dynamic execution graphs (i.e. conditional parallelism). The assessment of the implementation of the parallelizer based on non-strict independence has also been started. The results so far show the performance of the parallelized code to be quite encouraging. Also, some progress has been made at the tool design level: the conditions for non-strict independence based on sharing/freeness developed in the previous period have been refined and improved. Also, new, more relaxed conditions have been proposed for the case of pure goals, based on the availability of compile-time abstract information regarding the partial answers occurring in the execution of these goals [12, 15].

In addition to using more relaxed (but safe) notions of independence, such as non-strict independence, another way of improving the speedups obtained during parallelization is by specializing programs. We have also devised and implemented a novel technique for multiple specialization of programs. This technique uses the information provided by global analysis to distinguish the different uses of a given predicate yielding simplified and/or specialized predicates, and thus improving their efficiency. The proposed technique can be used for many program optimizations and, in particular, to improve program partitioning. The specialization technique has been embedded in the &-Prolog parallelizing compiler, integrated with its global analysis framework PLAI, and evaluated experimentally [63].

Finally, significant progress has been made in extending the analyzer to support full, practical languages [14] and in supporting large programs through several techniques such as, for example, incremental global analysis [43].

At ECRC the GAIP system, a general purpose abstract interpreter, has been developed and interfaced with and extended version of the CASLOG cost analysis system. Based on the output of these two systems a preliminary version of an Or-parallel Program annotator is transforming a sequential logic program into an Or-parallel one.

Next: Sequentialization Tools Up: Achievements and Misses Previous: Integration and Assessment

<webmaster@clip.dia.fi.upm.es> Last Modified: Fri May 9 18:15:08 MET DST 1997