Alessandro Pellegrini - Academia.edu (original) (raw)

Papers by Alessandro Pellegrini

Research paper thumbnail of Programming agent-based demographic models with cross-state and message-exchange dependencies: A study with speculative PDES and automatic load-sharing

2016 Winter Simulation Conference (WSC), 2016

Agent-based modeling and simulation is a versatile and promising methodology to capture complex i... more Agent-based modeling and simulation is a versatile and promising methodology to capture complex interactions among entities and their surrounding environment. A great advantage is its ability to model phenomena at a macro scale by exploiting simpler descriptions at a micro level. It has been proven effective in many fields, and it is rapidly becoming a de-facto standard in the study of population dynamics. In this article we study programmability and performance aspects of the last-generation ROOT-Sim speculative PDES environment for multi/many-core shared-memory architectures. ROOT-Sim transparently offers a programming model where interactions can be based on both explicit message passing and in-place state accesses. We introduce programming guidelines for systematic exploitation of these facilities in agent-based simulations, and we study the effects on performance of an innovative load-sharing policy targeting these types of dependencies. An experimental assessment with synthetic and real-world applications is provided, to assess the validity of our proposal. 1 INTRODUCTION Context. Agent-based modeling (ABM) is a simulation technique providing abstract representations of a scenario via a descriptive model targeted at reproducing its evolution through its components, including their decision-making capabilities and interaction patterns. An agent, which is a component of the overall descriptive model, can be defined as an entity (theoretical, virtual, or physical) capable of acting on itself, on the environment in which it evolves, and capable of communicating/interacting with other agents (Jennings, Sycara, and Wooldridge 1998). ABM is very effective in capturing interactions at a macro scale which directly or indirectly come from the way agents behave at a micro scale level. In this sense, the individual or collective interactions among agents can be used to effectively derive properties of general systems which could be difficult to study by using more traditional simulation techniques. Therefore, the intrinsic expressive power offered by ABM makes it a proven solution to explore complex real-world scenarios, such as disaster rescue (

Research paper thumbnail of Granular Time Warp Objects

Proceedings of the 2016 annual ACM Conference on SIGSIM Principles of Advanced Discrete Simulation - SIGSIM-PADS '16, 2016

A recent trend has shown the relevance of PDES paradigms where simulation objects are no longer s... more A recent trend has shown the relevance of PDES paradigms where simulation objects are no longer seen as fully disjoint entities only interacting via events' scheduling. Particularly, mutual cross-state access (as a form of state sharing) can represent an approach enabling the simplification of the programmer's job. In this article, we present a multi-core oriented Time Warp platform supporting so called granular objects, where cross-state access is transparently enabled jointly with the dynamic clustering (granulation) of objects into groups depending on the volume of mutual state accesses along phases of the model execution. Each group represents an island where activities are sequentially dispatched in timestamp order. Concurrency is still preserved by enabling the optimistic execution of the different islands. Granulated objects do not pay synchronization costs due to mutual causal inconsistencies. Also, the underlying Time Warp platform does not pay memory management (e.g. memory access tracing) overheads to determine that mutual accesses are taking place within a group. Overall, the platform transparently (and dynamically) determines a well-suited granulation of the overall model state, and a corresponding level of concurrency, depending on the actual state access pattern by the simulation code. As far as we know, this is the first study where the problem of clustering Time Warp simulation objects is addressed for the case of in-place crossobject state accesses by the application code, and where dynamic granulation of multiple objects in a larger one is supported in a fully transparent manner. We integrated our proposal in the open source ROOT-Sim platform.

Research paper thumbnail of Hardware-Transactional-Memory Based Speculative Parallel Discrete Event Simulation of Very Fine Grain Models

2015 IEEE 22nd International Conference on High Performance Computing (HiPC), 2015

This article presents an innovative runtime support for speculative parallel processing of discre... more This article presents an innovative runtime support for speculative parallel processing of discrete event simulation models on multi-core architectures, which exploits Hardware-Transactional-Memory (HTM) facilities for the purpose of state recoverability. In this proposal, the speculative updates on the state of the simulation model are executed as concurrent HTM-based transactions that are also in charge of detecting whether the update is consistent with the advancement of logical-time along model execution. Our proposal is fully transparent to the application code. Hence, our HTMbased run-time support can host conventionally developed discrete event models relying on the concept of event-handlers to be dispatched by an underlying simulation engine. Experimental data show that our proposal provides 75% to 92% of the ideal speedup on an Intel Haswell based platform (equipped with 4 physical cores and HTM support) for discrete event models with event granularity ranging between 2 and 12 microseconds. The data also show that these same models cannot be executed efficiently on top of a last generation parallel discrete event simulation platform employing softwarebased recoverability.

Research paper thumbnail of RAMSES: Reversibility-Based Agent Modeling and Simulation Environment with Speculation-Support

Lecture Notes in Computer Science, 2015

Research paper thumbnail of A Study on the Parallelization of Terrain-Covering Ant Robots Simulations

Lecture Notes in Computer Science, 2014

Agent-based simulation is used as a tool for supporting (timecritical) decision making in differe... more Agent-based simulation is used as a tool for supporting (timecritical) decision making in differentiated contexts. Hence, techniques for speeding up the execution of agent-based models, such as Parallel Discrete Event Simulation (PDES), are of great relevance/benefit. On the other hand, parallelism entails that the final output provided by the simulator should closely match the one provided by a traditional sequential run. This is not obvious given that, for performance and efficiency reasons, parallel simulation engines do not allow the evaluation of global predicates on the simulation model evolution with arbitrary timegranularity along the simulation time-axis. In this article we present a study on the effects of parallelization of agent-based simulations, focusing on complementary aspects such as performance and reliability of the provided simulation output. We target Terrain Covering Ant Robots (TCAR) simulations, which are useful in rescue scenarios to determine how many agents (i.e., robots) should be used to completely explore a certain terrain for possible victims within a given time.

Research paper thumbnail of A Framework for High Performance Simulation of Transactional Data Grid Platforms

Proceedings of the Sixth International Conference on Simulation Tools and Techniques, 2013

The large diffusion of shared-memory multi-core machines has impacted the way Parallel Discrete E... more The large diffusion of shared-memory multi-core machines has impacted the way Parallel Discrete Event Simulation (PDES) engines are built. While they were originally conceived as data-partitioned platforms, where each thread is in charge of managing a subset of simulation objects, nowadays the trend is to shift towards share-everything settings. In this scenario, any thread can (in principle) take care of CPU-dispatching pending events bound to whichever simulation object, which helps to fully share the load across the available CPU-cores. Hence, a fundamental aspect to be tackled is to provide an efficient globally-shared pending events' set from which multiple worker threads can concurrently extract events to be processed, and into which they can concurrently insert new produced events to be processed in the future. To cope with this aspect, we present the design and implementation of a concurrent non-blocking pending events' set data structure, which can be seen as a variant of a classical calendar queue. Early experimental data collected with a synthetic stress test are reported, showing excellent scalability of our proposal on a machine equipped with 32 CPU-cores. CCS Concepts •Theory of computation → Data structures design and analysis; Shared memory algorithms; •Computing methodologies → Discrete-event simulation;

Research paper thumbnail of Cache-Aware Memory Manager for Optimistic Simulations

Proceedings of the Fifth International Conference on Simulation Tools and Techniques, 2012

Parallel Discrete Event Simulation is a well known technique for executing complex general-purpos... more Parallel Discrete Event Simulation is a well known technique for executing complex general-purpose simulations where models are described as objects the interaction of which is expressed through the generation of impulsive events. In particular, Optimistic Simulation allows full exploitation of the available computational power, avoiding the need to compute safety properties for the events to be executed. Optimistic Simulation platforms internally rely on several data structures, which are meant to support operations aimed at ensuring correctness, inter-kernel communication and/or event scheduling. These housekeeping and management operations access them according to complex patterns, commonly suffering from misuse of memory caching architectures. In particular, operations like log/restore access data structures on a periodic basis, producing the replacement of in-cache buffers related to the actual working set of the application logic, producing a non-negligible performance drop. In this work we propose generally-applicable design principles for a new memory management subsystem targeted at Optimistic Simulation platforms which can face this issue by wisely allocating memory buffers depending on their actual future access patterns, in order to enhance event-execution memory locality. Additionally, an application-transparent implementation within ROOT-Sim, an open-source generalpurpose optimistic simulation platform, is presented along with experimental results testing our proposal.

Research paper thumbnail of An Evolutionary Algorithm to Optimize Log/Restore Operations within Optimistic Simulation Platforms

Proceedings of the 4th International ICST Conference on Simulation Tools and Techniques, 2011

In this work we address state recoverability in advanced optimistic simulation systems by proposi... more In this work we address state recoverability in advanced optimistic simulation systems by proposing an evolutionary algorithm to optimize at run-time the parameters associated with state log/restore activities. Optimization takes place by adaptively selecting for each simulation object both (i) the best suited log mode (incremental vs non-incremental) and (ii) the corresponding optimal value of the log interval. Our performance optimization approach allows to indirectly cope with hidden effects (e.g., locality) as well as cross-object effects due to the variation of log/restore parameters for different simulation objects (e.g., rollback thrashing). Both of them are not captured by literature solutions based on analytical models of the overhead associated with log/restore tasks. More in detail, our evolutionary algorithm dynamically adjusts the log/restore parameters of distinct simulation objects as a whole, towards a well suited configuration. In such a way, we prevent negative effects on performance due to the biasing of the optimization towards individual simulation objects, which may cause reduced gains (or even decrease) in performance just due to the aforementioned hidden and/or cross-object phenomena. We also present an application-transparent implementation of the evolutionary algorithm within the ROme OpTimistic Simulator (ROOT-Sim), namely an open source, general purpose simulation environment designed according to the optimistic synchronization paradigm. Further, we provide the results of an experimental study testing our proposal on a suite of simulation models for wireless communication systems.

Research paper thumbnail of Autonomic Log/Restore for Advanced Optimistic Simulation Systems

2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 2010

In this paper we address state recoverability in optimistic simulation systems by presenting an a... more In this paper we address state recoverability in optimistic simulation systems by presenting an autonomic log/restore architecture. Our proposal is unique in that it jointly provides the following features: (i) log/restore operations are carried out in a completely transparent manner to the application programmer, (ii) the simulation-object state can be scattered across dynamically allocated non-contiguous memory chunks, (iii) two differentiated operating modes, incremental vs non-incremental, coexist via transparent, optimized run-time management of dual versions of the same application layer, with dynamic selection of the best suited operating mode in different phases of the optimistic simulation run, and (iv) determination of the best suited mode for any time frame is carried out on the basis of an innovative modeling/optimization approach that takes into account stability of each operating mode vs variations of the model execution parameters.

Research paper thumbnail of A load-sharing architecture for high performance optimistic simulations on multi-core machines

2012 19th International Conference on High Performance Computing, 2012

In Parallel Discrete Event Simulation (PDES), the simulation model is partitioned into a set of d... more In Parallel Discrete Event Simulation (PDES), the simulation model is partitioned into a set of distinct Logical Processes (LPs) which are allowed to concurrently execute simulation events. In this work we present an innovative approach to loadsharing on multi-core/multiprocessor machines, targeted at the optimistic PDES paradigm, where LPs are speculatively allowed to process simulation events with no preventive verification of causal consistency, and actual consistency violations (if any) are recovered via rollback techniques. In our approach, each simulation kernel instance, in charge of hosting and executing a specific set of LPs, runs a set of worker threads, which can be dynamically activated/deactivated on the basis of a distributed algorithm. The latter relies in turn on an analytical model that provides indications on how to reassign processor/core usage across the kernels in order to handle the simulation workload as efficiently as possible. We also present a real implementation of our load-sharing architecture within the ROme OpTimistic Simulator (ROOT-Sim), namely an open-source C-based simulation platform implemented according to the PDES paradigm and the optimistic synchronization approach. Experimental results for an assessment of the validity of our proposal are presented as well.

Research paper thumbnail of The ROme OpTimistic Simulator: Core Internals and Programming Model

Proceedings of the 4th International ICST Conference on Simulation Tools and Techniques, 2011

In this article we overview the ROme OpTimistic Simulator (ROOT-Sim), an open source C/MPI-based ... more In this article we overview the ROme OpTimistic Simulator (ROOT-Sim), an open source C/MPI-based simulation package targeted at POSIX systems, which implements general-purpose parallel/distributed simulation environment relying on the optimistic (i.e. rollback based) synchronization paradigm. It offers a very simple programming model based on the classical notion of simulation-event handlers, to be implemented according to the ANSI-C standard, and transparently supports all the services required to parallelize the execution. It also offers a set of optimized protocols (e.g. CPU scheduling and state log/restore protocols) aimed at minimizing the run-time overhead by the platform, thus allowing for high performance and scalability. Here we overview the core internal mechanisms provided within ROOT-Sim, together with the offered API and the programming model that is expected to be agreed in order to produce simulation software that can be transparently run, in a concurrent fashion, on top of the ROOT-Sim layer. Short code examples are also discussed.

Research paper thumbnail of Tuning the Level of Concurrency in Software Transactional Memory: An Overview of Recent Analytical, Machine Learning and Mixed Approaches

Lecture Notes in Computer Science, 2015

Synchronization transparency offered by Software Transactional Memory (STM) must not come at the ... more Synchronization transparency offered by Software Transactional Memory (STM) must not come at the expense of run-time efficiency, thus demanding from the STM-designer the inclusion of mechanisms properly oriented to performance and other quality indexes. Particularly, one core issue to cope with in STM is related to exploiting parallelism while also avoiding thrashing phenomena due to excessive transaction rollbacks, caused by excessively high levels of contention on logical resources, namely concurrently accessed data portions. A means to address run-time efficiency consists in dynamically determining the best-suited level of concurrency (number of threads) to be employed for running the application (or specific application phases) on top of the STM layer. For too low levels of concurrency, parallelism can be hampered. Conversely, over-dimensioning the concurrency level may give rise to the aforementioned thrashing phenomena caused by excessive data contention-an aspect which has reflections also on the side of reduced energy-efficiency. In this chapter we overview a set of recent techniques aimed at building "application-specific" performance models that can be exploited to dynamically tune the level of concurrency to the best-suited value. Although they share some base concepts while modeling the system performance vs the degree of concurrency, these techniques rely on disparate methods, such as machine learning or

Research paper thumbnail of Programming agent-based demographic models with cross-state and message-exchange dependencies: A study with speculative PDES and automatic load-sharing

2016 Winter Simulation Conference (WSC), 2016

Agent-based modeling and simulation is a versatile and promising methodology to capture complex i... more Agent-based modeling and simulation is a versatile and promising methodology to capture complex interactions among entities and their surrounding environment. A great advantage is its ability to model phenomena at a macro scale by exploiting simpler descriptions at a micro level. It has been proven effective in many fields, and it is rapidly becoming a de-facto standard in the study of population dynamics. In this article we study programmability and performance aspects of the last-generation ROOT-Sim speculative PDES environment for multi/many-core shared-memory architectures. ROOT-Sim transparently offers a programming model where interactions can be based on both explicit message passing and in-place state accesses. We introduce programming guidelines for systematic exploitation of these facilities in agent-based simulations, and we study the effects on performance of an innovative load-sharing policy targeting these types of dependencies. An experimental assessment with synthetic and real-world applications is provided, to assess the validity of our proposal. 1 INTRODUCTION Context. Agent-based modeling (ABM) is a simulation technique providing abstract representations of a scenario via a descriptive model targeted at reproducing its evolution through its components, including their decision-making capabilities and interaction patterns. An agent, which is a component of the overall descriptive model, can be defined as an entity (theoretical, virtual, or physical) capable of acting on itself, on the environment in which it evolves, and capable of communicating/interacting with other agents (Jennings, Sycara, and Wooldridge 1998). ABM is very effective in capturing interactions at a macro scale which directly or indirectly come from the way agents behave at a micro scale level. In this sense, the individual or collective interactions among agents can be used to effectively derive properties of general systems which could be difficult to study by using more traditional simulation techniques. Therefore, the intrinsic expressive power offered by ABM makes it a proven solution to explore complex real-world scenarios, such as disaster rescue (

Research paper thumbnail of Granular Time Warp Objects

Proceedings of the 2016 annual ACM Conference on SIGSIM Principles of Advanced Discrete Simulation - SIGSIM-PADS '16, 2016

A recent trend has shown the relevance of PDES paradigms where simulation objects are no longer s... more A recent trend has shown the relevance of PDES paradigms where simulation objects are no longer seen as fully disjoint entities only interacting via events' scheduling. Particularly, mutual cross-state access (as a form of state sharing) can represent an approach enabling the simplification of the programmer's job. In this article, we present a multi-core oriented Time Warp platform supporting so called granular objects, where cross-state access is transparently enabled jointly with the dynamic clustering (granulation) of objects into groups depending on the volume of mutual state accesses along phases of the model execution. Each group represents an island where activities are sequentially dispatched in timestamp order. Concurrency is still preserved by enabling the optimistic execution of the different islands. Granulated objects do not pay synchronization costs due to mutual causal inconsistencies. Also, the underlying Time Warp platform does not pay memory management (e.g. memory access tracing) overheads to determine that mutual accesses are taking place within a group. Overall, the platform transparently (and dynamically) determines a well-suited granulation of the overall model state, and a corresponding level of concurrency, depending on the actual state access pattern by the simulation code. As far as we know, this is the first study where the problem of clustering Time Warp simulation objects is addressed for the case of in-place crossobject state accesses by the application code, and where dynamic granulation of multiple objects in a larger one is supported in a fully transparent manner. We integrated our proposal in the open source ROOT-Sim platform.

Research paper thumbnail of Hardware-Transactional-Memory Based Speculative Parallel Discrete Event Simulation of Very Fine Grain Models

2015 IEEE 22nd International Conference on High Performance Computing (HiPC), 2015

This article presents an innovative runtime support for speculative parallel processing of discre... more This article presents an innovative runtime support for speculative parallel processing of discrete event simulation models on multi-core architectures, which exploits Hardware-Transactional-Memory (HTM) facilities for the purpose of state recoverability. In this proposal, the speculative updates on the state of the simulation model are executed as concurrent HTM-based transactions that are also in charge of detecting whether the update is consistent with the advancement of logical-time along model execution. Our proposal is fully transparent to the application code. Hence, our HTMbased run-time support can host conventionally developed discrete event models relying on the concept of event-handlers to be dispatched by an underlying simulation engine. Experimental data show that our proposal provides 75% to 92% of the ideal speedup on an Intel Haswell based platform (equipped with 4 physical cores and HTM support) for discrete event models with event granularity ranging between 2 and 12 microseconds. The data also show that these same models cannot be executed efficiently on top of a last generation parallel discrete event simulation platform employing softwarebased recoverability.

Research paper thumbnail of RAMSES: Reversibility-Based Agent Modeling and Simulation Environment with Speculation-Support

Lecture Notes in Computer Science, 2015

Research paper thumbnail of A Study on the Parallelization of Terrain-Covering Ant Robots Simulations

Lecture Notes in Computer Science, 2014

Agent-based simulation is used as a tool for supporting (timecritical) decision making in differe... more Agent-based simulation is used as a tool for supporting (timecritical) decision making in differentiated contexts. Hence, techniques for speeding up the execution of agent-based models, such as Parallel Discrete Event Simulation (PDES), are of great relevance/benefit. On the other hand, parallelism entails that the final output provided by the simulator should closely match the one provided by a traditional sequential run. This is not obvious given that, for performance and efficiency reasons, parallel simulation engines do not allow the evaluation of global predicates on the simulation model evolution with arbitrary timegranularity along the simulation time-axis. In this article we present a study on the effects of parallelization of agent-based simulations, focusing on complementary aspects such as performance and reliability of the provided simulation output. We target Terrain Covering Ant Robots (TCAR) simulations, which are useful in rescue scenarios to determine how many agents (i.e., robots) should be used to completely explore a certain terrain for possible victims within a given time.

Research paper thumbnail of A Framework for High Performance Simulation of Transactional Data Grid Platforms

Proceedings of the Sixth International Conference on Simulation Tools and Techniques, 2013

The large diffusion of shared-memory multi-core machines has impacted the way Parallel Discrete E... more The large diffusion of shared-memory multi-core machines has impacted the way Parallel Discrete Event Simulation (PDES) engines are built. While they were originally conceived as data-partitioned platforms, where each thread is in charge of managing a subset of simulation objects, nowadays the trend is to shift towards share-everything settings. In this scenario, any thread can (in principle) take care of CPU-dispatching pending events bound to whichever simulation object, which helps to fully share the load across the available CPU-cores. Hence, a fundamental aspect to be tackled is to provide an efficient globally-shared pending events' set from which multiple worker threads can concurrently extract events to be processed, and into which they can concurrently insert new produced events to be processed in the future. To cope with this aspect, we present the design and implementation of a concurrent non-blocking pending events' set data structure, which can be seen as a variant of a classical calendar queue. Early experimental data collected with a synthetic stress test are reported, showing excellent scalability of our proposal on a machine equipped with 32 CPU-cores. CCS Concepts •Theory of computation → Data structures design and analysis; Shared memory algorithms; •Computing methodologies → Discrete-event simulation;

Research paper thumbnail of Cache-Aware Memory Manager for Optimistic Simulations

Proceedings of the Fifth International Conference on Simulation Tools and Techniques, 2012

Parallel Discrete Event Simulation is a well known technique for executing complex general-purpos... more Parallel Discrete Event Simulation is a well known technique for executing complex general-purpose simulations where models are described as objects the interaction of which is expressed through the generation of impulsive events. In particular, Optimistic Simulation allows full exploitation of the available computational power, avoiding the need to compute safety properties for the events to be executed. Optimistic Simulation platforms internally rely on several data structures, which are meant to support operations aimed at ensuring correctness, inter-kernel communication and/or event scheduling. These housekeeping and management operations access them according to complex patterns, commonly suffering from misuse of memory caching architectures. In particular, operations like log/restore access data structures on a periodic basis, producing the replacement of in-cache buffers related to the actual working set of the application logic, producing a non-negligible performance drop. In this work we propose generally-applicable design principles for a new memory management subsystem targeted at Optimistic Simulation platforms which can face this issue by wisely allocating memory buffers depending on their actual future access patterns, in order to enhance event-execution memory locality. Additionally, an application-transparent implementation within ROOT-Sim, an open-source generalpurpose optimistic simulation platform, is presented along with experimental results testing our proposal.

Research paper thumbnail of An Evolutionary Algorithm to Optimize Log/Restore Operations within Optimistic Simulation Platforms

Proceedings of the 4th International ICST Conference on Simulation Tools and Techniques, 2011

In this work we address state recoverability in advanced optimistic simulation systems by proposi... more In this work we address state recoverability in advanced optimistic simulation systems by proposing an evolutionary algorithm to optimize at run-time the parameters associated with state log/restore activities. Optimization takes place by adaptively selecting for each simulation object both (i) the best suited log mode (incremental vs non-incremental) and (ii) the corresponding optimal value of the log interval. Our performance optimization approach allows to indirectly cope with hidden effects (e.g., locality) as well as cross-object effects due to the variation of log/restore parameters for different simulation objects (e.g., rollback thrashing). Both of them are not captured by literature solutions based on analytical models of the overhead associated with log/restore tasks. More in detail, our evolutionary algorithm dynamically adjusts the log/restore parameters of distinct simulation objects as a whole, towards a well suited configuration. In such a way, we prevent negative effects on performance due to the biasing of the optimization towards individual simulation objects, which may cause reduced gains (or even decrease) in performance just due to the aforementioned hidden and/or cross-object phenomena. We also present an application-transparent implementation of the evolutionary algorithm within the ROme OpTimistic Simulator (ROOT-Sim), namely an open source, general purpose simulation environment designed according to the optimistic synchronization paradigm. Further, we provide the results of an experimental study testing our proposal on a suite of simulation models for wireless communication systems.

Research paper thumbnail of Autonomic Log/Restore for Advanced Optimistic Simulation Systems

2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 2010

In this paper we address state recoverability in optimistic simulation systems by presenting an a... more In this paper we address state recoverability in optimistic simulation systems by presenting an autonomic log/restore architecture. Our proposal is unique in that it jointly provides the following features: (i) log/restore operations are carried out in a completely transparent manner to the application programmer, (ii) the simulation-object state can be scattered across dynamically allocated non-contiguous memory chunks, (iii) two differentiated operating modes, incremental vs non-incremental, coexist via transparent, optimized run-time management of dual versions of the same application layer, with dynamic selection of the best suited operating mode in different phases of the optimistic simulation run, and (iv) determination of the best suited mode for any time frame is carried out on the basis of an innovative modeling/optimization approach that takes into account stability of each operating mode vs variations of the model execution parameters.

Research paper thumbnail of A load-sharing architecture for high performance optimistic simulations on multi-core machines

2012 19th International Conference on High Performance Computing, 2012

In Parallel Discrete Event Simulation (PDES), the simulation model is partitioned into a set of d... more In Parallel Discrete Event Simulation (PDES), the simulation model is partitioned into a set of distinct Logical Processes (LPs) which are allowed to concurrently execute simulation events. In this work we present an innovative approach to loadsharing on multi-core/multiprocessor machines, targeted at the optimistic PDES paradigm, where LPs are speculatively allowed to process simulation events with no preventive verification of causal consistency, and actual consistency violations (if any) are recovered via rollback techniques. In our approach, each simulation kernel instance, in charge of hosting and executing a specific set of LPs, runs a set of worker threads, which can be dynamically activated/deactivated on the basis of a distributed algorithm. The latter relies in turn on an analytical model that provides indications on how to reassign processor/core usage across the kernels in order to handle the simulation workload as efficiently as possible. We also present a real implementation of our load-sharing architecture within the ROme OpTimistic Simulator (ROOT-Sim), namely an open-source C-based simulation platform implemented according to the PDES paradigm and the optimistic synchronization approach. Experimental results for an assessment of the validity of our proposal are presented as well.

Research paper thumbnail of The ROme OpTimistic Simulator: Core Internals and Programming Model

Proceedings of the 4th International ICST Conference on Simulation Tools and Techniques, 2011

In this article we overview the ROme OpTimistic Simulator (ROOT-Sim), an open source C/MPI-based ... more In this article we overview the ROme OpTimistic Simulator (ROOT-Sim), an open source C/MPI-based simulation package targeted at POSIX systems, which implements general-purpose parallel/distributed simulation environment relying on the optimistic (i.e. rollback based) synchronization paradigm. It offers a very simple programming model based on the classical notion of simulation-event handlers, to be implemented according to the ANSI-C standard, and transparently supports all the services required to parallelize the execution. It also offers a set of optimized protocols (e.g. CPU scheduling and state log/restore protocols) aimed at minimizing the run-time overhead by the platform, thus allowing for high performance and scalability. Here we overview the core internal mechanisms provided within ROOT-Sim, together with the offered API and the programming model that is expected to be agreed in order to produce simulation software that can be transparently run, in a concurrent fashion, on top of the ROOT-Sim layer. Short code examples are also discussed.

Research paper thumbnail of Tuning the Level of Concurrency in Software Transactional Memory: An Overview of Recent Analytical, Machine Learning and Mixed Approaches

Lecture Notes in Computer Science, 2015

Synchronization transparency offered by Software Transactional Memory (STM) must not come at the ... more Synchronization transparency offered by Software Transactional Memory (STM) must not come at the expense of run-time efficiency, thus demanding from the STM-designer the inclusion of mechanisms properly oriented to performance and other quality indexes. Particularly, one core issue to cope with in STM is related to exploiting parallelism while also avoiding thrashing phenomena due to excessive transaction rollbacks, caused by excessively high levels of contention on logical resources, namely concurrently accessed data portions. A means to address run-time efficiency consists in dynamically determining the best-suited level of concurrency (number of threads) to be employed for running the application (or specific application phases) on top of the STM layer. For too low levels of concurrency, parallelism can be hampered. Conversely, over-dimensioning the concurrency level may give rise to the aforementioned thrashing phenomena caused by excessive data contention-an aspect which has reflections also on the side of reduced energy-efficiency. In this chapter we overview a set of recent techniques aimed at building "application-specific" performance models that can be exploited to dynamically tune the level of concurrency to the best-suited value. Although they share some base concepts while modeling the system performance vs the degree of concurrency, these techniques rely on disparate methods, such as machine learning or