Heuristic Guided Pre-Optimized Algorithm Substitution for Parallel Computers (original) (raw)
Related papers
Applying AI techniques to program optimization for parallel computers
TIIis paper describes an experiment of integrating expert systems technology and advanced compiler optimization teclmiques to the problem of paralIelizing programs for different classes of parallel computers. Our approach to solve the problem is to separate machine features from programming heuristics and organize the program parallelization knowledge in a hierarchical structure which we called heuristic hierarchy. The reasoning mechanism of the program restructuring system utilizes the heuristic hierarchy and features of the program and the target machine to choose appropriate sequences of transformations automatically. Theories and mechanisms for organizing and integrating the parallelism optimization knowledge are discussed. Methodologies for abstracting machine features, data management, and programming parallel computers are presented.
Lecture Notes in Computer Science, 1997
Code optimizations and restructuring transformations are typically applied before scheduling to improve the quality of generated code. However, in some cases, the optimizations and transformations do not lead to a better schedule or may even adversely affect the schedule. In particular, optimizations for redundancy elimination and restructuring transformations for increasing parallelism axe often accompanied with an increase in register pressure. Therefore their application in situations where register pressure is already too high may result in the generation of additional spill code. In this paper we present an integrated approach to scheduling that enables the selective application of optimizations and restructuring transformations by the scheduler when it determines their application to be beneficial. The integration is necessary because information that is used to determine the effects of optimizations and transformations on the schedule is only available during instruction scheduling. Our integrated scheduling approach is applicable to various types of global scheduling techniques; in this paper we present an integrated algorithm for scheduling superblocks.
2017
We present a program transformation approach to convert procedural code into functionally equivalent code adapted to a given platform. Our framework is based on the application of guarded transformation rules that capture semantic conditions to ensure the soundness of their application. Our goal is to determine a sequence of rule applications which transform some initial code into final code which optimizes some non-functional properties. The code to be transformed is adorned with semantic annotations, either provided by the user or by external analysis tools. These annotations give information to decide whether applying a transformation rule is or is not sound. In general, there are several rules applicable at several program points and, besides, transformation sequences do not monotonically change the optimization function. Therefore, we face a search problem that grows exponentially with the length of the transformation sequence. In our experience with even small examples, that b...
2014 International Conference on High Performance Computing & Simulation (HPCS), 2014
Research has shown that the memory load/store instructions consume an important part in execution time and energy consumption. Extracting available parallelism at different granularity has been an important approach for designing next generation highly parallel systems. In this work, we present MIPT, an architecture exploration framework that leverages instruction parallelism of memory and ALU operations from a sequential algorithm's execution trace. MIPT heuristics recommend memory port sizes and issue slot sizes for memory and ALU operations. Its custom simulator simulates and evaluates the recommended parallel version of the execution trace for measuring performance improvements versus dual port memory. MIPT's architecture exploration criteria is to improve performance by utilizing systems with multi-port memories and multi-issue ALUs. There exists design exploration tools such as Multi2Sim and Trimaran. These simulators offer customization of multi-port memory architectures but designers' initial starting points are usually unclear. Thus, MIPT can suggest initial starting point for customization in those design exploration systems. In addition, given same application with two different implementations, it is possible to compare their execution time by the MIPT simulator.
GAPS: Iterative feedback directed parallelisation using genetic algorithms
1998
The compilation of FORTRAN programs for SPMD execution on parallel architectures often requires the application of program restructuring transformations such as loop interchange, loop distribution, loop fusion, loop skewing and statement reordering. Determining the optimal transformation sequence that minimises execution time for a given program is an NP-complete problem. The hypothesis of the research described here is that genetic algorithm (GA) techniques can be used to determine the sequence of restructuring transformations which are better, or, as good as, those produced by more conventional compiler search techniques. The Genetic Algorithm Parallelisation System (GAPS) compiler framework is presented. GAPS uses a novel iterative feedback directed approach to autoparallelisation that is based upon genetic algorithm optimisation. Traditional restructuring transformations are represented as mappings that are applied to each statement and its associated iteration space. The hypothesis of GAPS is tested with a comparison of the performance of SPMD code produced by PFA, petit from the Omega Project at the University of Maryland and GAPS for an ADI 1024 1024 benchmark on an SGI Origin 2000. Encouraging initial results show that GAPS delivers performance improvements of up to 44% when the execution times of code produced by GAPS, PFA, PETIT from the Omega Project are compared for an ADI 1024 1024 benchmark on an SGI Origin2000. On this benchmark, GAPS produces code having 21 ? 25% improvements in parallel execution time.
Data and computation transformations for multiprocessors
1995
Effective memory hierarchy utilization is critical to the performance of modern multiprocessor architectures. We have developed the first compiler system that fully automatically parallelizes sequential programs and changes the original array layouts to improve memory system performance. Our optimization algorithm consists of two steps. The first step chooses the parallelization and computation assignment such that synchronization and data sharing are minimized. The second step then restructures the layout of the data in the shared address space with an algorithm that is based on a new data transformation framework. We ran our compiler on a set of application programs and measured their performance on the Stanford DASH multiprocessor. Our results show that the compiler can effectively optimize parallelism in conjunction with memory subsystem performance.
Compiler transformations for high-performance computing
ACM Computing Surveys, 1994
In the last three decades a large number of compiler transformations for optimizing programs have been implemented. Most optimizations for uniprocessors reduce the number of instructions executed by the program using transformations based on the analysis of scalar quantities and data-flow techniques. In contrast, optimizations for high-performance superscalar, vector, and parallel processors maximize parallelism and memory locality with transformations that rely on tracking the properties of arrays using loop dependence analysis. This survey is a comprehensive overview of the important high-level program restructuring techniques for imperative languages, such as C and Fortran. Transformations for both sequential and various types of parallel architectures are covered in depth. We describe the purpose of each transformation, explain how to determine if it is legal, and give an example of its application. Programmers wishing to enhance the performance of their code can use this survey...
Refactoring is the process of changing the structure of a program without changing its behavior. Refactoring has so far only really been deployed effectively for sequential programs. However, with the increased availability of multi core systems, refactoring can play an important role in helping both expert and non-expert parallel programmers structure and implement their parallel programs. This paper describes benefits or advantages of a refactoring approach for parallel programs using heterogeneous parallel architectures such as GPUs and CPUs. A refactoring based methodology gives many advantages over unaided parallel programming: it helps identify general patterns of parallelism; it guides the programmers through the process of refining a parallel program, whether new or existing; it enforces separation of concerns between application programmers and system programmers; and it reduces time to deployment. All of these advantages help programmers understand how to write parallel programs.
2014
Refactoring is the process of changing the structure of a program without changing its behavior. Refactoring has so far only really been deployed effectively for sequential programs. However, with the increased availability of multi core systems, refactoring can play an important role in helping both expert and non-expert parallel programmers structure and implement their parallel programs. This paper describes benefits or advantages of a refactoring approach for parallel programs using heterogeneous parallel architectures such as GPUs and CPUs. A refactoring based methodology gives many advantages over unaided parallel programming: it helps identify general patterns of parallelism; it guides the programmers through the process of refining a parallel program, whether new or existing; it enforces separation of concerns between application programmers and system programmers; and it reduces time to deployment. All of these advantages help programmers understand how to write parallel pr...