Roberto Solar - Profile on Academia.edu (original) (raw)
Papers by Roberto Solar
Towards rapid population genetics forward-in-time simulations
winter simulation conference, Dec 3, 2017
Computer simulations are an important tool for the current research in population and evolutionar... more Computer simulations are an important tool for the current research in population and evolutionary genetics. They help to understand the genetic evolution of complex processes dynamics that cannot be analytically predicted. The basic idea is to generate synthetic data sets of genetic polymorphisms under user-specified scenarios describing the evolutionary history and genetic architecture of a species. In this work, we focus on forward-in-time simulations which represent the most powerful, but, at the same time, most compute-intensive approach for simulating the genetic material of a population. We present a highly-optimized forward-in-time simulation library called Libgdrift, specially designed to create large sets of replicated simulations. Our simulation library uses code optimizations such as spatial locality and a two-phase data compression approach which allow fast simulation executions, while reducing memory storage. Results show that our proposal can improve the performance reported by well-known simulation software.
Simulating the cost of applications running on large clusters of processors poses difficulties in... more Simulating the cost of applications running on large clusters of processors poses difficulties in model definition and simulation. In this paper we propose a methodology to ease this burden. The user specifies a model describing it as a timed coloured Petri Net in a graphical manner by using a tool like CPN tool. The model is automatically converted into a XML specification. Then a code generator converts the XML file into C++ code which is linked to a simulation kernel. The result is an efficient and scalable simulation program that can be executed sequentially or in parallel on either single multi-core processors or cluster of multi-core processors. We illustrate the suitability of our proposal by modelling and simulating the cost of message passing in a Fat-tree network which is commonly used to support communication in cluster of processors.
Listas de clusters usando centros espacialmente dispersos para búsquedas por similitud en espacios métricos
Towards rapid population genetics forward-in-time simulations
Computer simulations are an important tool for the current research in population and evolutionar... more Computer simulations are an important tool for the current research in population and evolutionary genetics. They help to understand the genetic evolution of complex processes dynamics that cannot be analytically predicted. The basic idea is to generate synthetic data sets of genetic polymorphisms under user-specified scenarios describing the evolutionary history and genetic architecture of a species. In this work, we focus on forward-in-time simulations which represent the most powerful, but, at the same time, most compute-intensive approach for simulating the genetic material of a population. We present a highly-optimized forward-in-time simulation library called Libgdrift, specially designed to create large sets of replicated simulations. Our simulation library uses code optimizations such as spatial locality and a two-phase data compression approach which allow fast simulation executions, while reducing memory storage. Results show that our proposal can improve the performance r...
Evaluation of Static/Dynamic Cache for Similarity Search Engines
Lecture Notes in Computer Science, 2016
In large scale search systems, where it is important to achieve a high query throughput, cache st... more In large scale search systems, where it is important to achieve a high query throughput, cache strategies are a feasible tool to achieve this goal. A number of efficient cache strategies devised for exact query search in different application domains have been proposed so far. In similarity query search on metric spaces it is necessary to consider additional design requirements devised to produce good quality approximate results from the cache content. In this paper, we propose a Static/Dynamic cache strategy for metric spaces which takes advantage of results of static cache miss operations and their associated distance evaluations for increasing the overall performance of the cache. We present an experimental evaluation of the performance obtained with our strategy for different query selection/replacement strategies.
Parallel simulation is a powerful tool to evaluate the performance of large-scale systems. Howeve... more Parallel simulation is a powerful tool to evaluate the performance of large-scale systems. However when it comes to simulating large scale Web search engines, the parallel simulation execution can introduce imbalance among processors because event occurrence is driven by user behavior which is unpredictable, making events take place in different parts of the system in an irregular manner. In this paper, we study the impact of load balance strategies on the performance of a parallel simulation strategy. In particular, we present a consistent hashing load balance algorithm aimed to reduce queuing waiting times, evenly distribute the costs of executing events among processors, and more importantly migration of logical processes only occurs between neighbor processors. We use a Web search engine composed by services deployed on a cluster of processors as the application case study. Our simulations are driven by actual query log traces. We evaluate our proposed load balance algorithm in ...
A graph-based cache for large-scale similarity search engines
The Journal of Supercomputing
A Service-Oriented Platform for Approximate Bayesian Computation in Population Genetics
Journal of Computational Biology
Estrategias de Construcción sobre Estructuras Métricas para Búsquedas por Similitud
Listas de Clusters usando Centros Espacialmente Dispersos para Búsquedas por Similaridad en Espacios Métricos
Load Balance Strategies for DEVS Approximated Parallel and Distributed Discrete-Event Simulations
2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, 2015
Approximate parallel simulation of web search engines
Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation - SIGSIM-PADS '13, 2013
Procedia Computer Science, 2013
Parallel discrete event simulation (PDES) have shown to be an useful paradigm for simulating comp... more Parallel discrete event simulation (PDES) have shown to be an useful paradigm for simulating complex and large-scale models. An individual-oriented approach allows modelers capture complex emerging global behaviors generated by simple local interaction, like observed in self-organized systems. Usually, this type of simulations are highly expensive in terms of computing and communications. One one hand, we can reduce the computing involved in individual interactions by means of developing a robust partitioning method. On the other hand, we have to be able to efficiently handle a huge number of individuals interacting with other individuals stored in memory of remote processors. In this work we will analyze and compare three communication strategies: synchronous and asynchronous message passing (via MPI) and bulk-synchronous parallel (BSP) for our distributed cluster-based individual-oriented fish school simulator. In this type of simulations, the main contributions of our work are: a) we showed that distributed time-driven simulations do not always improve the performance when using synchronous communication strategies, b) we show asynchronous communications strategies are more efficient. In addition, we have verified that the bulk-synchronous parallel method is a scalable.
Procedia Computer Science, 2010
Computational simulation has been used as a powerful tool to represent the dynamical behavior of ... more Computational simulation has been used as a powerful tool to represent the dynamical behavior of systems based on complex analytic models. These types of models have two main drawbacks: a) limitations due to the degree of abstraction needed to simulate them, b) high computing power to simulate a heavily simplified models. The computing power available today can overcome these limitations to perform quicker simulations of complex models that are closer to reality. In this paper, the experiments and performance analysis of a distributed simulation for a complex individual oriented model (fish schools) are presented. The development of the fish school simulator includes the possibility of working with large models that include large numbers of fish (> 10 6 of individuals), predators and obstacles in the simulated world.
Procedia Computer Science, 2012
Partitioning and load balancing are highly important issues in distributed individual-oriented si... more Partitioning and load balancing are highly important issues in distributed individual-oriented simulation. Choosing how to distribute individuals on the distributed environment can be a crucial factor at the moment of executing the simulation. Partitioning an individual-oriented system should be efficient in order to reduce communication involved in interaction between individuals belong to different logical processes. Furthermore, if the individual-oriented model exhibits mobility patterns, we should be able to maintain the load balancing in order to keep the global application performance. In this work, we present a proximity load balancing strategy for a distributed cluster-based individualoriented fish school simulator. On one hand, we implement a robust cluster-based partitioning method by means of covering radius criterion and voronoi diagrams. We use a proximity criterion to distribute individuals on the distributed architecture. On the other hand, we propose a proximity load balancing strategy in order to maintain the application performance as the simulation progresses.
Procedia Computer Science, 2011
Individual-oriented simulation allows us to represent the global behavior of a system through loc... more Individual-oriented simulation allows us to represent the global behavior of a system through local interaction in discrete time steps. As we face up close-to-reality models and large-scale workloads, we focus on turning from traditional approaches towards distributed simulation in order to obtain more accurate results in less time. One of the main problems in distributed simulation is how to distribute individuals efficiently through distributed architecture. Individual-oriented systems can be implemented in a distributed fashion by using either a grid-based or cluster-based approach. On one hand, grid-based approaches consist of assigning to each node a simulation space portion, together with the set of individuals currently residing in that area. On the other hand, cluster-based approaches consist of assigning to each node a fixed set of individuals. In this work we present a cluster-based method based on Voronoi diagrams and covering radius criterion in order to avoid unnecessary interaction between individuals. We can show experimentally that our proposal reduces the communication and computing times significantly increasing simulation efficiency.
Towards rapid population genetics forward-in-time simulations
winter simulation conference, Dec 3, 2017
Computer simulations are an important tool for the current research in population and evolutionar... more Computer simulations are an important tool for the current research in population and evolutionary genetics. They help to understand the genetic evolution of complex processes dynamics that cannot be analytically predicted. The basic idea is to generate synthetic data sets of genetic polymorphisms under user-specified scenarios describing the evolutionary history and genetic architecture of a species. In this work, we focus on forward-in-time simulations which represent the most powerful, but, at the same time, most compute-intensive approach for simulating the genetic material of a population. We present a highly-optimized forward-in-time simulation library called Libgdrift, specially designed to create large sets of replicated simulations. Our simulation library uses code optimizations such as spatial locality and a two-phase data compression approach which allow fast simulation executions, while reducing memory storage. Results show that our proposal can improve the performance reported by well-known simulation software.
Simulating the cost of applications running on large clusters of processors poses difficulties in... more Simulating the cost of applications running on large clusters of processors poses difficulties in model definition and simulation. In this paper we propose a methodology to ease this burden. The user specifies a model describing it as a timed coloured Petri Net in a graphical manner by using a tool like CPN tool. The model is automatically converted into a XML specification. Then a code generator converts the XML file into C++ code which is linked to a simulation kernel. The result is an efficient and scalable simulation program that can be executed sequentially or in parallel on either single multi-core processors or cluster of multi-core processors. We illustrate the suitability of our proposal by modelling and simulating the cost of message passing in a Fat-tree network which is commonly used to support communication in cluster of processors.
Listas de clusters usando centros espacialmente dispersos para búsquedas por similitud en espacios métricos
Towards rapid population genetics forward-in-time simulations
Computer simulations are an important tool for the current research in population and evolutionar... more Computer simulations are an important tool for the current research in population and evolutionary genetics. They help to understand the genetic evolution of complex processes dynamics that cannot be analytically predicted. The basic idea is to generate synthetic data sets of genetic polymorphisms under user-specified scenarios describing the evolutionary history and genetic architecture of a species. In this work, we focus on forward-in-time simulations which represent the most powerful, but, at the same time, most compute-intensive approach for simulating the genetic material of a population. We present a highly-optimized forward-in-time simulation library called Libgdrift, specially designed to create large sets of replicated simulations. Our simulation library uses code optimizations such as spatial locality and a two-phase data compression approach which allow fast simulation executions, while reducing memory storage. Results show that our proposal can improve the performance r...
Evaluation of Static/Dynamic Cache for Similarity Search Engines
Lecture Notes in Computer Science, 2016
In large scale search systems, where it is important to achieve a high query throughput, cache st... more In large scale search systems, where it is important to achieve a high query throughput, cache strategies are a feasible tool to achieve this goal. A number of efficient cache strategies devised for exact query search in different application domains have been proposed so far. In similarity query search on metric spaces it is necessary to consider additional design requirements devised to produce good quality approximate results from the cache content. In this paper, we propose a Static/Dynamic cache strategy for metric spaces which takes advantage of results of static cache miss operations and their associated distance evaluations for increasing the overall performance of the cache. We present an experimental evaluation of the performance obtained with our strategy for different query selection/replacement strategies.
Parallel simulation is a powerful tool to evaluate the performance of large-scale systems. Howeve... more Parallel simulation is a powerful tool to evaluate the performance of large-scale systems. However when it comes to simulating large scale Web search engines, the parallel simulation execution can introduce imbalance among processors because event occurrence is driven by user behavior which is unpredictable, making events take place in different parts of the system in an irregular manner. In this paper, we study the impact of load balance strategies on the performance of a parallel simulation strategy. In particular, we present a consistent hashing load balance algorithm aimed to reduce queuing waiting times, evenly distribute the costs of executing events among processors, and more importantly migration of logical processes only occurs between neighbor processors. We use a Web search engine composed by services deployed on a cluster of processors as the application case study. Our simulations are driven by actual query log traces. We evaluate our proposed load balance algorithm in ...
A graph-based cache for large-scale similarity search engines
The Journal of Supercomputing
A Service-Oriented Platform for Approximate Bayesian Computation in Population Genetics
Journal of Computational Biology
Estrategias de Construcción sobre Estructuras Métricas para Búsquedas por Similitud
Listas de Clusters usando Centros Espacialmente Dispersos para Búsquedas por Similaridad en Espacios Métricos
Load Balance Strategies for DEVS Approximated Parallel and Distributed Discrete-Event Simulations
2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, 2015
Approximate parallel simulation of web search engines
Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation - SIGSIM-PADS '13, 2013
Procedia Computer Science, 2013
Parallel discrete event simulation (PDES) have shown to be an useful paradigm for simulating comp... more Parallel discrete event simulation (PDES) have shown to be an useful paradigm for simulating complex and large-scale models. An individual-oriented approach allows modelers capture complex emerging global behaviors generated by simple local interaction, like observed in self-organized systems. Usually, this type of simulations are highly expensive in terms of computing and communications. One one hand, we can reduce the computing involved in individual interactions by means of developing a robust partitioning method. On the other hand, we have to be able to efficiently handle a huge number of individuals interacting with other individuals stored in memory of remote processors. In this work we will analyze and compare three communication strategies: synchronous and asynchronous message passing (via MPI) and bulk-synchronous parallel (BSP) for our distributed cluster-based individual-oriented fish school simulator. In this type of simulations, the main contributions of our work are: a) we showed that distributed time-driven simulations do not always improve the performance when using synchronous communication strategies, b) we show asynchronous communications strategies are more efficient. In addition, we have verified that the bulk-synchronous parallel method is a scalable.
Procedia Computer Science, 2010
Computational simulation has been used as a powerful tool to represent the dynamical behavior of ... more Computational simulation has been used as a powerful tool to represent the dynamical behavior of systems based on complex analytic models. These types of models have two main drawbacks: a) limitations due to the degree of abstraction needed to simulate them, b) high computing power to simulate a heavily simplified models. The computing power available today can overcome these limitations to perform quicker simulations of complex models that are closer to reality. In this paper, the experiments and performance analysis of a distributed simulation for a complex individual oriented model (fish schools) are presented. The development of the fish school simulator includes the possibility of working with large models that include large numbers of fish (> 10 6 of individuals), predators and obstacles in the simulated world.
Procedia Computer Science, 2012
Partitioning and load balancing are highly important issues in distributed individual-oriented si... more Partitioning and load balancing are highly important issues in distributed individual-oriented simulation. Choosing how to distribute individuals on the distributed environment can be a crucial factor at the moment of executing the simulation. Partitioning an individual-oriented system should be efficient in order to reduce communication involved in interaction between individuals belong to different logical processes. Furthermore, if the individual-oriented model exhibits mobility patterns, we should be able to maintain the load balancing in order to keep the global application performance. In this work, we present a proximity load balancing strategy for a distributed cluster-based individualoriented fish school simulator. On one hand, we implement a robust cluster-based partitioning method by means of covering radius criterion and voronoi diagrams. We use a proximity criterion to distribute individuals on the distributed architecture. On the other hand, we propose a proximity load balancing strategy in order to maintain the application performance as the simulation progresses.
Procedia Computer Science, 2011
Individual-oriented simulation allows us to represent the global behavior of a system through loc... more Individual-oriented simulation allows us to represent the global behavior of a system through local interaction in discrete time steps. As we face up close-to-reality models and large-scale workloads, we focus on turning from traditional approaches towards distributed simulation in order to obtain more accurate results in less time. One of the main problems in distributed simulation is how to distribute individuals efficiently through distributed architecture. Individual-oriented systems can be implemented in a distributed fashion by using either a grid-based or cluster-based approach. On one hand, grid-based approaches consist of assigning to each node a simulation space portion, together with the set of individuals currently residing in that area. On the other hand, cluster-based approaches consist of assigning to each node a fixed set of individuals. In this work we present a cluster-based method based on Voronoi diagrams and covering radius criterion in order to avoid unnecessary interaction between individuals. We can show experimentally that our proposal reduces the communication and computing times significantly increasing simulation efficiency.