Mateo Valero | Barcelona Supercomputing Center (original) (raw)

Uploads

Papers by Mateo Valero

Research paper thumbnail of Fair CPU time accounting in CMP+SMT processors

ACM Transactions on Architecture and Code Optimization, 2013

Processor architectures combining several paradigms of Thread-Level Parallelism (TLP), such as CM... more Processor architectures combining several paradigms of Thread-Level Parallelism (TLP), such as CMP processors in which each core is SMT, are becoming more and more popular as a way to improve performance at a moderate cost. However, the complex interaction between running tasks in hardware shared resources in multi-TLP architectures introduces complexities when accounting CPU time (or CPU utilization) to tasks. The CPU utilization accounted to a task depends on both the time it runs in the processor and the amount of processor hardware resources it receives. Deploying systems with accurate CPU accounting mechanisms is necessary to increase fairness. Moreover, it will allow users to be fairly charged on a shared data center, facilitating server consolidation in future systems. In this article we analyze the accuracy and hardware cost of previous CPU accounting mechanisms for pure-CMP and pure-SMT processors and we show that they are not adequate for CMP+SMT processors. Consequently, ...

Research paper thumbnail of Heuristics for register-constrained software pipelining

Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29

Research paper thumbnail of AMMC: Advanced Multi-Core Memory Controller

2014 International Conference on Field-Programmable Technology (FPT), 2014

Research paper thumbnail of A case for resource-conscious out-of-order processors

ACM SIGARCH Computer Architecture News, 2004

Modern out-of-order processors tolerate long-latency memory operations by supporting a large numb... more Modern out-of-order processors tolerate long-latency memory operations by supporting a large number of in-flight instructions. This is achieved in part through proper sizing of critical resources, such as register files or instruction queues. In light of the increasing gap between processor speed and memory latency, tolerating upcoming latencies in this way would require impractical sizes of such critical resources.To tackle this scalability problem, we make a case for resource-conscious out-of-order processors. We present quantitative evidence that critical resources are increasingly underutilized in these processors. We advocate that better use of such resources should be a priority in future research in processor architectures. In particular, we present some of our research having such observations as a basis to deal with future resource conscious processors.

Research paper thumbnail of Latency tolerant branch predictors

Innovative Architecture for Future Generation High-Performance Processors and Systems, 2003

Research paper thumbnail of PAMS: Pattern Aware Memory System for embedded systems

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14), 2014

Research paper thumbnail of Adapting cache partitioning algorithms to pseudo-LRU replacement policies

2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010

Research paper thumbnail of A Flexible Heterogeneous Multi-Core Architecture

16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), 2007

Research paper thumbnail of Circuit design of a dual-versioning L1 data cache

Integration, the VLSI Journal, 2012

Research paper thumbnail of Software Trace Cache

IEEE Transactions on Computers, 2005

Research paper thumbnail of Modulo scheduling with reduced register pressure

IEEE Transactions on Computers, 1998

Research paper thumbnail of Predictable performance in SMT processors: synergy between the OS and SMTs

IEEE Transactions on Computers, 2006

Research paper thumbnail of Better Branch Prediction Through Prophet/Critic Hybrids

Research paper thumbnail of Multicore Resource Management

Research paper thumbnail of Parallel Processing in Sequence Matching

Research paper thumbnail of Implementing kilo-instruction multiprocessors

ICPS '05. Proceedings. International Conference on Pervasive Services, 2005.

Research paper thumbnail of HPCC 2008 Organizing and Program Committees

2008 10th IEEE International Conference on High Performance Computing and Communications, 2008

Research paper thumbnail of Initial Results on Fuzzy Floating Point Computation for Multimedia Processors

IEEE Computer Architecture Letters, 2002

Research paper thumbnail of Aggressive Speculative Execution for Hiding Memory Latency

Research paper thumbnail of 19TH Euromicro Symposium on Microprocessing and Microprogramming (Euromicro 93)-BARCELONA, September 6-9, 1993

Research paper thumbnail of Fair CPU time accounting in CMP+SMT processors

ACM Transactions on Architecture and Code Optimization, 2013

Processor architectures combining several paradigms of Thread-Level Parallelism (TLP), such as CM... more Processor architectures combining several paradigms of Thread-Level Parallelism (TLP), such as CMP processors in which each core is SMT, are becoming more and more popular as a way to improve performance at a moderate cost. However, the complex interaction between running tasks in hardware shared resources in multi-TLP architectures introduces complexities when accounting CPU time (or CPU utilization) to tasks. The CPU utilization accounted to a task depends on both the time it runs in the processor and the amount of processor hardware resources it receives. Deploying systems with accurate CPU accounting mechanisms is necessary to increase fairness. Moreover, it will allow users to be fairly charged on a shared data center, facilitating server consolidation in future systems. In this article we analyze the accuracy and hardware cost of previous CPU accounting mechanisms for pure-CMP and pure-SMT processors and we show that they are not adequate for CMP+SMT processors. Consequently, ...

Research paper thumbnail of Heuristics for register-constrained software pipelining

Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29

Research paper thumbnail of AMMC: Advanced Multi-Core Memory Controller

2014 International Conference on Field-Programmable Technology (FPT), 2014

Research paper thumbnail of A case for resource-conscious out-of-order processors

ACM SIGARCH Computer Architecture News, 2004

Modern out-of-order processors tolerate long-latency memory operations by supporting a large numb... more Modern out-of-order processors tolerate long-latency memory operations by supporting a large number of in-flight instructions. This is achieved in part through proper sizing of critical resources, such as register files or instruction queues. In light of the increasing gap between processor speed and memory latency, tolerating upcoming latencies in this way would require impractical sizes of such critical resources.To tackle this scalability problem, we make a case for resource-conscious out-of-order processors. We present quantitative evidence that critical resources are increasingly underutilized in these processors. We advocate that better use of such resources should be a priority in future research in processor architectures. In particular, we present some of our research having such observations as a basis to deal with future resource conscious processors.

Research paper thumbnail of Latency tolerant branch predictors

Innovative Architecture for Future Generation High-Performance Processors and Systems, 2003

Research paper thumbnail of PAMS: Pattern Aware Memory System for embedded systems

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14), 2014

Research paper thumbnail of Adapting cache partitioning algorithms to pseudo-LRU replacement policies

2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010

Research paper thumbnail of A Flexible Heterogeneous Multi-Core Architecture

16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), 2007

Research paper thumbnail of Circuit design of a dual-versioning L1 data cache

Integration, the VLSI Journal, 2012

Research paper thumbnail of Software Trace Cache

IEEE Transactions on Computers, 2005

Research paper thumbnail of Modulo scheduling with reduced register pressure

IEEE Transactions on Computers, 1998

Research paper thumbnail of Predictable performance in SMT processors: synergy between the OS and SMTs

IEEE Transactions on Computers, 2006

Research paper thumbnail of Better Branch Prediction Through Prophet/Critic Hybrids

Research paper thumbnail of Multicore Resource Management

Research paper thumbnail of Parallel Processing in Sequence Matching

Research paper thumbnail of Implementing kilo-instruction multiprocessors

ICPS '05. Proceedings. International Conference on Pervasive Services, 2005.

Research paper thumbnail of HPCC 2008 Organizing and Program Committees

2008 10th IEEE International Conference on High Performance Computing and Communications, 2008

Research paper thumbnail of Initial Results on Fuzzy Floating Point Computation for Multimedia Processors

IEEE Computer Architecture Letters, 2002

Research paper thumbnail of Aggressive Speculative Execution for Hiding Memory Latency

Research paper thumbnail of 19TH Euromicro Symposium on Microprocessing and Microprogramming (Euromicro 93)-BARCELONA, September 6-9, 1993

Log In