Antonio Gonzalez - Academia.edu (original) (raw)

Antonio Gonzalez

Uploads

Papers by Antonio Gonzalez

Research paper thumbnail of Near-optimal loop tiling by means of cache miss equations and genetic algorithms

The effectiveness of the memory hierarchy is critical for the performance of current processors. ... more The effectiveness of the memory hierarchy is critical for the performance of current processors. The performance of the memory hierarchy can be improved by means of program transformations such as loop tiling, which is a code transformation targeted to reduce capacity misses. This paper presents a novel systematic approach to perform nearoptimal loop tiling based on an accurate data locality analysis (Cache Miss Equations) and a powerful technique to search the solution space that is based on a genetic algorithm. The results show that this approach can remove practically all capacity misses for all considered benchmarks. The reduction of replacement misses results in a decrease of the miss ratio that can be as significant as a factor of 7 for the matrix multiply kernel. eplacement equations. Given a reference, replacement equations represent the interferences with any other reference. For each pair of references (© and © ), the following expression gives the condition that determines whether they are mapped onto the same cache set:

Research paper thumbnail of Técnicas harware para Optimizar el Uso de los Registros en Procesadores Superescalares

Research paper thumbnail of Aggressive Speculative Execution for Hiding Memory Latency

Research paper thumbnail of Hardware support for early register release

ABSTRACT Register files are becoming one of the critical components of current out-of-order proce... more ABSTRACT Register files are becoming one of the critical components of current out-of-order processors in terms of delay and power consumption, since their potential to exploit instruction-level parallelism is closely related to the size and number of ports of the register file. In conventional register-renaming schemes, register releasing is conservatively done only after the instruction that redefines the same register is committed. Instead, we propose a scheme that releases registers as soon as the processor knows that there will be no further use of them. We present two early releasing hardware implementations with different performance/complexity trade-offs. Detailed cycle-level simulations show either a significant speedup for a given register file size, or a reduction in register file size for a given performance level.

Research paper thumbnail of Data Speculation

Research paper thumbnail of Multithreading and Speculation

Research paper thumbnail of Microarchitectural Techniques to Exploit Repetitive Computations and Values

Research paper thumbnail of Empowering a helper cluster through data-width aware instruction selection policies

Narrow values that can be represented by less number of bits than the full machine width occur ve... more Narrow values that can be represented by less number of bits than the full machine width occur very frequently in programs. On the other hand, clustering mechanisms enable cost-and performance-effective scaling of processor back-end features. Those attributes can be combined synergistically to design special clusters operating on narrow values (a.k.a. Helper Cluster), potentially providing performance benefits.

Research paper thumbnail of MICRO’S TOP PICKS FROM THE MICROARCHITECTURE CONFERENCES

Research paper thumbnail of s Top Picks from the Microarchitecture Conferences

Research paper thumbnail of Register allocation technique

Research paper thumbnail of Reducing Misspeculation Penalties in Trace-Level Speculative Multithreaded Architectures

Research paper thumbnail of Lightly Threaded Code in the Era of Highly Threaded Processors

Research paper thumbnail of On-Chip Networks

Research paper thumbnail of Article 9 (25 pages) Thread-Management Techniques to Maximize Efficiency in Multicore and Simultaneous Multithreaded Microprocessors

Research paper thumbnail of Adaptive Memory Hierarchies For Next Generation Tiled Microarchitectures

Research paper thumbnail of Support for speculative ownership without data

Research paper thumbnail of Managed instruction cache prefetching

Research paper thumbnail of Achieving coherence between dynamically optimized code and original code

Research paper thumbnail of Protecting data storage structures from intermittent errors

Research paper thumbnail of Near-optimal loop tiling by means of cache miss equations and genetic algorithms

The effectiveness of the memory hierarchy is critical for the performance of current processors. ... more The effectiveness of the memory hierarchy is critical for the performance of current processors. The performance of the memory hierarchy can be improved by means of program transformations such as loop tiling, which is a code transformation targeted to reduce capacity misses. This paper presents a novel systematic approach to perform nearoptimal loop tiling based on an accurate data locality analysis (Cache Miss Equations) and a powerful technique to search the solution space that is based on a genetic algorithm. The results show that this approach can remove practically all capacity misses for all considered benchmarks. The reduction of replacement misses results in a decrease of the miss ratio that can be as significant as a factor of 7 for the matrix multiply kernel. eplacement equations. Given a reference, replacement equations represent the interferences with any other reference. For each pair of references (© and © ), the following expression gives the condition that determines whether they are mapped onto the same cache set:

Research paper thumbnail of Técnicas harware para Optimizar el Uso de los Registros en Procesadores Superescalares

Research paper thumbnail of Aggressive Speculative Execution for Hiding Memory Latency

Research paper thumbnail of Hardware support for early register release

ABSTRACT Register files are becoming one of the critical components of current out-of-order proce... more ABSTRACT Register files are becoming one of the critical components of current out-of-order processors in terms of delay and power consumption, since their potential to exploit instruction-level parallelism is closely related to the size and number of ports of the register file. In conventional register-renaming schemes, register releasing is conservatively done only after the instruction that redefines the same register is committed. Instead, we propose a scheme that releases registers as soon as the processor knows that there will be no further use of them. We present two early releasing hardware implementations with different performance/complexity trade-offs. Detailed cycle-level simulations show either a significant speedup for a given register file size, or a reduction in register file size for a given performance level.

Research paper thumbnail of Data Speculation

Research paper thumbnail of Multithreading and Speculation

Research paper thumbnail of Microarchitectural Techniques to Exploit Repetitive Computations and Values

Research paper thumbnail of Empowering a helper cluster through data-width aware instruction selection policies

Narrow values that can be represented by less number of bits than the full machine width occur ve... more Narrow values that can be represented by less number of bits than the full machine width occur very frequently in programs. On the other hand, clustering mechanisms enable cost-and performance-effective scaling of processor back-end features. Those attributes can be combined synergistically to design special clusters operating on narrow values (a.k.a. Helper Cluster), potentially providing performance benefits.

Research paper thumbnail of MICRO’S TOP PICKS FROM THE MICROARCHITECTURE CONFERENCES

Research paper thumbnail of s Top Picks from the Microarchitecture Conferences

Research paper thumbnail of Register allocation technique

Research paper thumbnail of Reducing Misspeculation Penalties in Trace-Level Speculative Multithreaded Architectures

Research paper thumbnail of Lightly Threaded Code in the Era of Highly Threaded Processors

Research paper thumbnail of On-Chip Networks

Research paper thumbnail of Article 9 (25 pages) Thread-Management Techniques to Maximize Efficiency in Multicore and Simultaneous Multithreaded Microprocessors

Research paper thumbnail of Adaptive Memory Hierarchies For Next Generation Tiled Microarchitectures

Research paper thumbnail of Support for speculative ownership without data

Research paper thumbnail of Managed instruction cache prefetching

Research paper thumbnail of Achieving coherence between dynamically optimized code and original code

Research paper thumbnail of Protecting data storage structures from intermittent errors

Log In