Guido Araujo - Academia.edu (original) (raw)
Uploads
Papers by Guido Araujo
IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems, 2005
Efficient address code optimization is a central problem in code generation for processors with r... more Efficient address code optimization is a central problem in code generation for processors with restricted addressing modes, like Digital Signal Processors (DSPs). This paper proposes a new heuristic to solve the Simple Offset Assignment (SOA) problem, the problem of allocating scalar variables to memory so as to minimize addressing code. This new approach, called Coalescing SOA (CSOA), performs variable memory slot coalescing simultaneously to offset assignment computation. Experimental results, based on compiling MediaBench benchmark programs with LANCE compiler, reveal a very significant improvement over the previous solutions to SOA. In fact, CSOA produces, on average, 37.3% fewer update instructions when comparing with the prior solution that perform memory slot coalescing before applying SOA, and 66.2% fewer update instructions when comparing with the best traditional SOA solution.
The increasing demand for wireless devices running mobile applications has renewed the interest o... more The increasing demand for wireless devices running mobile applications has renewed the interest on the research of high performance low power processors that can be programmed using very compact code. One way to achieve this goal is to design specialized processors with short instruction formats and shallow pipelines. Given that it enables such architectural features, indirect addressing is the most used addressing mode in embedded programs. This paper analyzes the problem of allocating address registers to array references in loops using auto-increment addressing mode. It leverages on previous work, which is based on a heuristic that merges address register live ranges. We prove, for the first time, that the merge operation is NP-hard in general, and show the existence of an optimal linear-time algorithm, based on dynamic programming, for a special case of the problem.
ACM Transactions on Design Automation of Electronic Systems, 1998
In this paper we address the problem of code generation for basic blocks in heterogeneous memory-... more In this paper we address the problem of code generation for basic blocks in heterogeneous memory-register DSP processors. We propose a new a technique, based on register-transfer paths, that can be used for efficiently dismantling basic block DAGs (Directed Acyclic Graphs) into expression trees. This approach builds on recent results which report optimal code generation algorithm for expression trees for these architectures. This technique has been implemented and experimentally validated for the TMS320C25, a popular fixed point DSP processor. The results show that good code quality can be obtained using the proposed technique. An analysis of the type of DAGs found in the DSPstone benchmark programs reveals that the majority of basic blocks in this benchmark set are expression trees and leaf DAGs. This leads to our claim that tree based algorithms, like the one described in this paper, should be the technique of choice for basic blocks code generation with heterogeneous memory register architectures
ACM Transactions in Embedded Computing Systems, 2004
IEEE Transactions on Very Large Scale Integration Systems, 2000
IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems, 2005
Efficient address code optimization is a central problem in code generation for processors with r... more Efficient address code optimization is a central problem in code generation for processors with restricted addressing modes, like Digital Signal Processors (DSPs). This paper proposes a new heuristic to solve the Simple Offset Assignment (SOA) problem, the problem of allocating scalar variables to memory so as to minimize addressing code. This new approach, called Coalescing SOA (CSOA), performs variable memory slot coalescing simultaneously to offset assignment computation. Experimental results, based on compiling MediaBench benchmark programs with LANCE compiler, reveal a very significant improvement over the previous solutions to SOA. In fact, CSOA produces, on average, 37.3% fewer update instructions when comparing with the prior solution that perform memory slot coalescing before applying SOA, and 66.2% fewer update instructions when comparing with the best traditional SOA solution.
The increasing demand for wireless devices running mobile applications has renewed the interest o... more The increasing demand for wireless devices running mobile applications has renewed the interest on the research of high performance low power processors that can be programmed using very compact code. One way to achieve this goal is to design specialized processors with short instruction formats and shallow pipelines. Given that it enables such architectural features, indirect addressing is the most used addressing mode in embedded programs. This paper analyzes the problem of allocating address registers to array references in loops using auto-increment addressing mode. It leverages on previous work, which is based on a heuristic that merges address register live ranges. We prove, for the first time, that the merge operation is NP-hard in general, and show the existence of an optimal linear-time algorithm, based on dynamic programming, for a special case of the problem.
ACM Transactions on Design Automation of Electronic Systems, 1998
In this paper we address the problem of code generation for basic blocks in heterogeneous memory-... more In this paper we address the problem of code generation for basic blocks in heterogeneous memory-register DSP processors. We propose a new a technique, based on register-transfer paths, that can be used for efficiently dismantling basic block DAGs (Directed Acyclic Graphs) into expression trees. This approach builds on recent results which report optimal code generation algorithm for expression trees for these architectures. This technique has been implemented and experimentally validated for the TMS320C25, a popular fixed point DSP processor. The results show that good code quality can be obtained using the proposed technique. An analysis of the type of DAGs found in the DSPstone benchmark programs reveals that the majority of basic blocks in this benchmark set are expression trees and leaf DAGs. This leads to our claim that tree based algorithms, like the one described in this paper, should be the technique of choice for basic blocks code generation with heterogeneous memory register architectures
ACM Transactions in Embedded Computing Systems, 2004
IEEE Transactions on Very Large Scale Integration Systems, 2000