Domingo Gimenez - Academia.edu (original) (raw)
Uploads
Papers by Domingo Gimenez
2018 International Conference on High Performance Computing & Simulation (HPCS), 2018
2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2016
2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2016
2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing, 2012
Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing, 2001
Modeling the behaviour of linear algebra algorithms is very suitable for designing linear algebra... more Modeling the behaviour of linear algebra algorithms is very suitable for designing linear algebra software for high performance computers. This modelization would enable us to predict the execution time of the routines depending on a number of parameters. There are two groups of parameters, in the first, there are the parameters whose values can be chosen by the user: number of processors, processors grid configuration, distribution of data in the system, block size; and in the second, we have the parameters that specify the characteristics of a target architecture: arithmetic cost and start-up and word-sending cost of a communication operation. Thus, a linear algebra library could be designed in such a way that each routine takes the values of the parameters of the first group that provide the expected optimum execution time, and solves the problem. This library could, therefore be employed by a non-expert user to solve scientific or engineering problems, because the user does not need to determine the values of these parameters. The design methodology is analysed with one-sided block Jacobi methods to solve the symmetric eigenvalue problem. Variants for a logical ring and a logical rectangular mesh of processors are considered. An analytical model of the algorithm is developed, and the behaviour of the algorithm is analysed with message-passing using MPI in a SGI Origin 2000. With the parameters chosen by our model, the execution time is reduced from about 50% higher than the optimal to just 2%.
2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010
International Journal of Parallel Programming, 2013
The introduction of auto-tuning techniques in linear algebra shared-memory routines is analyzed. ... more The introduction of auto-tuning techniques in linear algebra shared-memory routines is analyzed. Information obtained in the installation of the routines is used at running time to take some decisions to reduce the total execution time. The study is carried out with routines at different levels (matrix multiplication, LU and Cholesky factorizations and linear systems symmetric or general routines) and with calls to routines in the LAPACK and PLASMA libraries with multithread implementations. Medium NUMA and large cc-NUMA systems are used in the experiments. This variety of routines, libraries and systems allows us to obtain general conclusions about the methodology to use for linear algebra shared-memory routines auto-tuning. Satisfactory execution times are obtained with the proposed methodology.
2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012
Lecture Notes in Computer Science, 2008
This paper shows different strategies for improving some metaheuristics for the solution of a tas... more This paper shows different strategies for improving some metaheuristics for the solution of a task mapping problem. Independent tasks with different computational costs and memory requirements are scheduled in a heterogeneous system with computational heterogeneity and memory constraints. The tuned methods proposed in this work could be used for optimizing realistic systems, such as scheduling independent processes onto a processors farm.
2005 IEEE International Conference on Cluster Computing, 2005
This paper presents a self-optimization methodology for parallel linear algebra routines on heter... more This paper presents a self-optimization methodology for parallel linear algebra routines on heterogeneous systems. For each routine, a series of decisions is taken automatically in order to obtain an execution time close to the optimum (without rewriting the routine's code). Some of these decisions are: the number of processes to generate, the heterogeneous distribution of these processes over the network of processors, the logical topology of the generated processes, ... To reduce the searching space of such decisions, different heuristics have been used. The experiments have been performed with a parallel LU factorization routine similar to the ScaLAPACK one, and good results have been obtained on different heterogeneous platforms.
Procedia Computer Science, 2014
Procedia Computer Science, 2014
Lecture Notes in Computer Science, 2008
In this paper the use of metaheuristics techniques in a parallel computing course is explained. I... more In this paper the use of metaheuristics techniques in a parallel computing course is explained. In the practicals of the course different metaheuristics are used in the solution of a mapping problem in which processes are assigned to processors in a heterogeneous environment, with heterogeneity in computation and in the network. The parallelization of the metaheuristics is also considered.
IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2013
Some optimization problems can be tackled only with metaheuristic methods, and to obtain a satisf... more Some optimization problems can be tackled only with metaheuristic methods, and to obtain a satisfactory metaheuristic, it is necessary to develop and experiment with various methods and to tune them for each particular problem. The use of a unified scheme for metaheuristics facilitates the development of metaheuristics by reutilizing the basic functions. In our proposal, the unified scheme is improved by adding transitional parameters. Those parameters are included in each of the functions, in such a way that different values of the parameters provide different metaheuristics or combinations of metaheuristics. Thus, the unified parameterized scheme eases the development of metaheuristics and their application. In this paper, we expose the basic ideas of the parameterization of metaheuristics. This methodology is tested with the application of local and global search methods (greedy randomized adaptive search procedure [GRASP], genetic algorithms, and scatter search), and their combinations, to three scientific problems: obtaining satisfactory simultaneous equation models from a set of values of the variables, a task-to-processor assignment problem with independent tasks and memory constrains, and the p-hub median location-allocation problem. Index Terms-Genetic algorithms (GAs), greedy randomized adaptive search procedure (GRASP), parameterized metaheuristics, scatter search (SS), unified metaheuristics.
2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, 2011
2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores, 2013
Procedia Computer Science, 2013
2018 International Conference on High Performance Computing & Simulation (HPCS), 2018
2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2016
2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2016
2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing, 2012
Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing, 2001
Modeling the behaviour of linear algebra algorithms is very suitable for designing linear algebra... more Modeling the behaviour of linear algebra algorithms is very suitable for designing linear algebra software for high performance computers. This modelization would enable us to predict the execution time of the routines depending on a number of parameters. There are two groups of parameters, in the first, there are the parameters whose values can be chosen by the user: number of processors, processors grid configuration, distribution of data in the system, block size; and in the second, we have the parameters that specify the characteristics of a target architecture: arithmetic cost and start-up and word-sending cost of a communication operation. Thus, a linear algebra library could be designed in such a way that each routine takes the values of the parameters of the first group that provide the expected optimum execution time, and solves the problem. This library could, therefore be employed by a non-expert user to solve scientific or engineering problems, because the user does not need to determine the values of these parameters. The design methodology is analysed with one-sided block Jacobi methods to solve the symmetric eigenvalue problem. Variants for a logical ring and a logical rectangular mesh of processors are considered. An analytical model of the algorithm is developed, and the behaviour of the algorithm is analysed with message-passing using MPI in a SGI Origin 2000. With the parameters chosen by our model, the execution time is reduced from about 50% higher than the optimal to just 2%.
2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010
International Journal of Parallel Programming, 2013
The introduction of auto-tuning techniques in linear algebra shared-memory routines is analyzed. ... more The introduction of auto-tuning techniques in linear algebra shared-memory routines is analyzed. Information obtained in the installation of the routines is used at running time to take some decisions to reduce the total execution time. The study is carried out with routines at different levels (matrix multiplication, LU and Cholesky factorizations and linear systems symmetric or general routines) and with calls to routines in the LAPACK and PLASMA libraries with multithread implementations. Medium NUMA and large cc-NUMA systems are used in the experiments. This variety of routines, libraries and systems allows us to obtain general conclusions about the methodology to use for linear algebra shared-memory routines auto-tuning. Satisfactory execution times are obtained with the proposed methodology.
2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012
Lecture Notes in Computer Science, 2008
This paper shows different strategies for improving some metaheuristics for the solution of a tas... more This paper shows different strategies for improving some metaheuristics for the solution of a task mapping problem. Independent tasks with different computational costs and memory requirements are scheduled in a heterogeneous system with computational heterogeneity and memory constraints. The tuned methods proposed in this work could be used for optimizing realistic systems, such as scheduling independent processes onto a processors farm.
2005 IEEE International Conference on Cluster Computing, 2005
This paper presents a self-optimization methodology for parallel linear algebra routines on heter... more This paper presents a self-optimization methodology for parallel linear algebra routines on heterogeneous systems. For each routine, a series of decisions is taken automatically in order to obtain an execution time close to the optimum (without rewriting the routine's code). Some of these decisions are: the number of processes to generate, the heterogeneous distribution of these processes over the network of processors, the logical topology of the generated processes, ... To reduce the searching space of such decisions, different heuristics have been used. The experiments have been performed with a parallel LU factorization routine similar to the ScaLAPACK one, and good results have been obtained on different heterogeneous platforms.
Procedia Computer Science, 2014
Procedia Computer Science, 2014
Lecture Notes in Computer Science, 2008
In this paper the use of metaheuristics techniques in a parallel computing course is explained. I... more In this paper the use of metaheuristics techniques in a parallel computing course is explained. In the practicals of the course different metaheuristics are used in the solution of a mapping problem in which processes are assigned to processors in a heterogeneous environment, with heterogeneity in computation and in the network. The parallelization of the metaheuristics is also considered.
IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2013
Some optimization problems can be tackled only with metaheuristic methods, and to obtain a satisf... more Some optimization problems can be tackled only with metaheuristic methods, and to obtain a satisfactory metaheuristic, it is necessary to develop and experiment with various methods and to tune them for each particular problem. The use of a unified scheme for metaheuristics facilitates the development of metaheuristics by reutilizing the basic functions. In our proposal, the unified scheme is improved by adding transitional parameters. Those parameters are included in each of the functions, in such a way that different values of the parameters provide different metaheuristics or combinations of metaheuristics. Thus, the unified parameterized scheme eases the development of metaheuristics and their application. In this paper, we expose the basic ideas of the parameterization of metaheuristics. This methodology is tested with the application of local and global search methods (greedy randomized adaptive search procedure [GRASP], genetic algorithms, and scatter search), and their combinations, to three scientific problems: obtaining satisfactory simultaneous equation models from a set of values of the variables, a task-to-processor assignment problem with independent tasks and memory constrains, and the p-hub median location-allocation problem. Index Terms-Genetic algorithms (GAs), greedy randomized adaptive search procedure (GRASP), parameterized metaheuristics, scatter search (SS), unified metaheuristics.
2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, 2011
2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores, 2013
Procedia Computer Science, 2013