Decreasing process memory requirements by overlapping program portions (original) (raw)

1998, Hawaii International Conference on System Sciences

Most compiler optimizations focus on saving time and sometimes occur at the expense of increasing size. Yet processor speeds continue to increase at a faster rate than main memory and disk access times. Processors are now frequently being used in embedded systems that often have strict limitations on the size of programs it can execute. Also, reducing the size of

Mutual Inlining: An Inlining Algorithm to Reduce the Executable Size

Embedded Systems and Applications, 2022

We consider the problem of selecting an optimized subset of inlinings (replacing a call to a function by its body) that minimize the resulting code size. Frequently, in embedded systems, the program’s executable file size must fit into a small size memory. In such cases, the compiler should generate as small as possible executables. In particular, we seek to improve the code size obtained by the LLVM inliner executed with the -Oz option. One important aspect is whether or not this problem requires a global solution that considers the full span of the call graph or a local solution (as is the case with the LLVM inliner) that decides whether to apply inlining to each call separately based on the expected code-size improvement. We have implemented a global type of inlining algorithm called Mutual Inlining that selects the next call-site (f()callsg() to be inline based on its global properties. The first property is the number of calls to g(). Next property is determining if inlining g(...

GCDS: A compiler strategy for trading code size against preformance in embedded applications

: In this paper we present gcds, a new approach for code optimization within the context of embedded systems. gcds applies several transformation sequences on each piece of code and chooses a posteriori the best trade-off between code size and performance, given the particular constraints. The proposed implementation relies on a modular framework that makes easy to handle different compilation strategies on the same piece of code. Key-words: VLIW, compiler, software pipeline, trade-off, code size, execution time, embedded systems (R'esum'e : tsvp) * This study is supported by the OCEANS Esprit project. Centre National de la Recherche Scientifique Institut National de Recherche en Informatique (UPRESSA 6074) Universit e de Rennes 1 -- Insa de Rennes et en Automatique -- unit e de recherche de Rennes GCDS: R'esum'e : Dans cet article nous proposons une nouvelle approche appel'ee gcds pour l'optimisation d'applications pour syst`emes embarqu'es o`u les c...

The effect of compiler optimizations on available parallelism in scalar programs

1991

Abstract���In this paper we analyze the effect of compiler optimizations on fine grain parallelism in scalar programs. We characterize three levels of optimization: classical, superscalar, and multiprocessor. We show that classical optimizations not only improve a program's efficiency but also its parallelism. Superscalar optimizations further improve the parallelism for moderately parallel machines. For highly parallel machines, however, they actually constrain available parallelism.

Software Simultaneous Multi-Threading, a Technique to Exploit Task-Level Parallelism to Improve Instruction-and Data-Level Parallelism

The search for energy efficiency in the design of embedded systems is leading toward CPUs with higher instruction-level and data-level parallelism. Unfortunately, individual applications do not have sufficient parallelism to keep all these CPU resources busy. Since embedded systems often consist of multiple tasks, task-level parallelism can be used for the purpose. Simultaneous multi-threading (SMT) proved a valuable technique to do so in high-performance systems, but it cannot be afforded in system with tight energy budgets. Moreover, it does not exploit data-level parallel hardware, and does not exploit the available information on threads. We propose software-SMT (SW-SMT), a technique to exploit task-level parallelism to improve the utilization of both instruction-level and data-level parallel hardware, thereby improving performance. The technique performs simultaneous compilation of multiple threads at design-time, and it includes a run-time selection of the most efficient mixes. We have applied the technique to two major blocks of a SDR (software-defined radio) application, achieving energy gains up to 46% on different ILP and DLP architectures. We show that the potentials of SW-SMT increase with SIMD datapath size and VLIW issue width.

Loading...

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.