Francesco Quaglia - Profile on Academia.edu (original) (raw)

Papers by Francesco Quaglia

Proceedings of the 2018 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

Along the years, Parallel Discrete Event Simulation (PDES) has been enriched with programming fac... more Along the years, Parallel Discrete Event Simulation (PDES) has been enriched with programming facilities to bypass state disjointness across the concurrent Logical Processes (LPs). New supports have been proposed, offering the programmer approaches alternative to message passing to code complex LPs' relations. Along this path we find Event & Cross-State (ECS), which allows writing event handlers which can perform in-place accesses to the state of any LP, by simply relying on pointers. This programming model has been shipped with a runtime support enabling concurrent speculative execution of LPs limited to shared-memory machines. In this paper, we present the design of a middleware layer that allows ECS to be ported to distributed-memory clusters of machines. A core application of our middleware is to let ECS-coded models be hosted on top of (low-cost) resources from the Cloud. Overall, ECS-coded models no longer demand for powerful shared-memory machines to execute in reasonable time. Thanks to our solution, we retain indeed the possibility to rely on the enriched ECS programming model while still enabling deployments of PDES models on convenient (Cloudbased) infrastructures. An experimental assessment of our proposal is also provided. CCS CONCEPTS • Computing methodologies → Discrete-event simulation; • Theory of computation → Shared memory algorithms; • Software and its engineering → Distributed memory;

Proceeding of the 2001 Winter Simulation Conference (Cat. No.01CH37304)

Checkpointing overhead is a major obstacle for the effectiveness of Time Warp parallel discrete e... more Checkpointing overhead is a major obstacle for the effectiveness of Time Warp parallel discrete event simulators. Semi-asynchronous checkpointing is a recent solution to tackle this obstacle for Time Warp simulations on distributed memory systems based on Myrinet. In this solution, checkpoint operations are offloaded from the host CPU and are charged to a DMA engine on board of Myrinet network cards. In this paper we report an empirical evaluation of the benefits from semi-asynchronous checkpointing for Time Warp simulations of a large state Personal Communication System (PCS) model. PCS simulation models are typically characterized by high communication locality among the LPs hosted by the same machine, therefore the hardware on board of the Myrinet cards is typically underutilized if used to support exclusively communication. We show that the execution speed of Time Warp simulations of a large state PCS model can be increased when semi-asynchronous checkpointing is adopted.

Proceedings of the 2020 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, 2020

A rollback operation in a speculative parallel discrete event simulator has traditionally targete... more A rollback operation in a speculative parallel discrete event simulator has traditionally targeted the perfect reconstruction of the state to be restored after a timestamp-order violation. This imposes that the rollback support entails specific capabilities and consequently pays given costs. In this article we propose approximated rollbacks, which allow a simulation object to perfectly realign its virtual time to the timestamp of the state to be restored, but lead the reconstructed state to be an approximation of what it should really be. The advantage is an important reduction of the cost for managing the state restore task in a rollback phase, as well as for managing the activities (i.e. state saving) that actually enable rollbacks to be executed. Our proposal is suited for stochastic simulations, and explores a tradeoff between the statistical representativeness of the outcome of the simulation run and the execution performance. We provide mechanisms that enable the application programmer to control this tradeoff, as well as simulation-platform level mechanisms that constitute the basis for managing approximate rollbacks in general simulation scenarios. A study on the aforementioned tradeoff is also presented.

ACM Transactions on Modeling and Computer Simulation, 2020

The large diffusion of multi-core machines has pushed the research in the field of Parallel Discr... more The large diffusion of multi-core machines has pushed the research in the field of Parallel Discrete Event Simulation (PDES) toward new programming paradigms, based on the exploitation of shared memory. On the opposite side, the advent of Cloud computing—and the possibility to group together many (low-cost) virtual machines to form a distributed memory cluster capable of hosting simulation applications—has raised the need to bridge shared memory programming and seamless distributed execution. In this article, we present the design of a distributed middleware that transparently allows a PDES application coded for shared memory systems to run on clusters of (Cloud) resources. Our middleware is based on a synchronization protocol called Event and Cross State Synchronization. It allows cross-simulation-object access by event handlers, thus representing a powerful tool for the development of various types of PDES applications. We also provide data for an experimental assessment of our mi...

Proceedings of the 2016 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, 2016

A recent trend has shown the relevance of PDES paradigms where simulation objects are no longer s... more A recent trend has shown the relevance of PDES paradigms where simulation objects are no longer seen as fully disjoint entities only interacting via events' scheduling. Particularly, mutual cross-state access (as a form of state sharing) can represent an approach enabling the simplification of the programmer's job. In this article, we present a multi-core oriented Time Warp platform supporting so called granular objects, where cross-state access is transparently enabled jointly with the dynamic clustering (granulation) of objects into groups depending on the volume of mutual state accesses along phases of the model execution. Each group represents an island where activities are sequentially dispatched in timestamp order. Concurrency is still preserved by enabling the optimistic execution of the different islands. Granulated objects do not pay synchronization costs due to mutual causal inconsistencies. Also, the underlying Time Warp platform does not pay memory management (e.g. memory access tracing) overheads to determine that mutual accesses are taking place within a group. Overall, the platform transparently (and dynamically) determines a well-suited granulation of the overall model state, and a corresponding level of concurrency, depending on the actual state access pattern by the simulation code. As far as we know, this is the first study where the problem of clustering Time Warp simulation objects is addressed for the case of in-place crossobject state accesses by the application code, and where dynamic granulation of multiple objects in a larger one is supported in a fully transparent manner. We integrated our proposal in the open source ROOT-Sim platform.

International Journal of Parallel Programming, 2016

Parallelizing (compute intensive) Discrete Event Simulation (DES) applications is a classical app... more Parallelizing (compute intensive) Discrete Event Simulation (DES) applications is a classical approach for speeding up their execution and for making very large/complex simulation models tractable. This has been historically achieved via Parallel DES (PDES) techniques, which are based on partitioning the simulation model into distinct simulation objects (somehow resembling objects in classical object-oriented programming), whose states are disjoint, which are executed concurrently and rely on explicit event-exchange (or event-scheduling) primitives as the means to support mutual dependencies and notification of their state updates. With this approach, the application developer is necessarily forced to reason about state separation across the objects, thus being not allowed to rely on shared information, such as global variables, within the application code. This implicitly leads to the shift of the user-exposed programming model to one where sequential-style global variable accesses within the application code are not allowed. In this article we remove this limitation by providing support for managing global variables in the context of DES code developed in ANSI-C, which gets automatically parallelized. Particularly, we focus on speculative (also termed optimistic) PDES systems that run on top of multi-core machines, where simulation objects can concurrently process their events with no guarantee of causal consistency and actual violations of causality rules are recovered through rollback/recovery schemes. In compliance with the nature of speculative processing, in our proposal global variables are transparently mapped to multi-versions, so as to avoid any form of safety predicate verification upon their updates. Consistency is ensured via the introduction of a new rollback/recovery scheme based on detecting global variables' reads on non-correct versions. At the same time, efficiency in the execution is guaranteed by managing multi-version variables' lists via non

2015 IEEE 22nd International Conference on High Performance Computing (HiPC), 2015

This article presents an innovative runtime support for speculative parallel processing of discre... more This article presents an innovative runtime support for speculative parallel processing of discrete event simulation models on multi-core architectures, which exploits Hardware-Transactional-Memory (HTM) facilities for the purpose of state recoverability. In this proposal, the speculative updates on the state of the simulation model are executed as concurrent HTM-based transactions that are also in charge of detecting whether the update is consistent with the advancement of logical-time along model execution. Our proposal is fully transparent to the application code. Hence, our HTMbased run-time support can host conventionally developed discrete event models relying on the concept of event-handlers to be dispatched by an underlying simulation engine. Experimental data show that our proposal provides 75% to 92% of the ideal speedup on an Intel Haswell based platform (equipped with 4 physical cores and HTM support) for discrete event models with event granularity ranging between 2 and 12 microseconds. The data also show that these same models cannot be executed efficiently on top of a last generation parallel discrete event simulation platform employing softwarebased recoverability.

Proceedings of the 3rd ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, 2015

The Time Warp synchronization protocol for Parallel Discrete Event Simulation (PDES) is universal... more The Time Warp synchronization protocol for Parallel Discrete Event Simulation (PDES) is universally considered as a viable solution to exploit the intrinsic simulation model parallelism and to provide model execution speedup. Yet, it leads the PDES system to execute events in an order which may generate causal inconsistencies that need to be recovered via rollback, which requires restoration of a previous (consistent) simulation state any time a causality violation is detected. The rollback operation is so critical for the performance of a Time Warp system that it has been extensively studied in the literature for decades, to find approaches suitable to optimize it. The proposed solutions can be roughly classified as based on either checkpointing or reverse computing. In this article, we explore the practical design and implementation of a fully new approach based on the runtime generation of so called undo code blocks, which are blocks of instructions implementing the reverse memory side-effects generated by the forward execution of the events. However, this is not done by recomputing the original values to be restored, as instead it occurs in reverse computing schemes. Hence, the philosophy undo code blocks rely on is similar in spirit to that of undo-logs (as a form of checkpointing). Nevertheless, they are not data logs (as instead checkpoints are), rather they are logs of instructions. Our proposal is fully-transparent, thanks to the reliance on static software instrumentation (targeting the x86 architecture and Linux systems). Also, as we show, it can be combined with classical checkpointing, so as to further improve the runtime behavior of the state recoverability support as a function of the workload. We also present experimental results related to our implementation, which is released as free software and fully integrated into the open source ROOT-Sim (The ROme OpTimistic Simulator) package. Experimental data support the viability and effectiveness of our proposal.

The 2005 Symposium on Applications and the Internet

In this work we present a protocol ensuring the e-Transaction guarantee (i.e. a recently proposed... more In this work we present a protocol ensuring the e-Transaction guarantee (i.e. a recently proposed end-to-end reliability guarantee) in a Web based, three-tier transactional system. The protocol does not need any coordination among the replicas of the application server, thus exhibiting negligible overhead in normal behavior. Additionally, it achieves highly efficient fail-over especially for the case of back-end database employing Optimistic Concurrency Control (OCC), namely a type of concurrency control well suited for data access performed via Web. We also present a comparative discussion with existing solutions and a quantitative analysis of the proposed protocol, which clearly quantifies its benefits, in terms of reduced user perceived latency, especially when employed in combination with OCC.

This paper presents a checkpointing scheme for optimistic simulation which is a mixed approach be... more This paper presents a checkpointing scheme for optimistic simulation which is a mixed approach between periodic and pmbabilistic checkpointing. The latter; basing on statistical datu collected during the simulation. aims at recording as checkpoints states of a logica/ process that have high probability to be restored due to rollback (this is done in order to make those states immediately available). The periodic part prevents pe@ormance degradation due to state reconstruction (coasting forward) cost whenever the collected statistics do not allow to identify states high/y likely to be restored. Hence. this scheme can be seen as a highly general solution to tackle the checkpointprob-/em in optimistic simulation. A performance comparison with previous solutions is carried out through a simuhztion study of (I store-and-forward communication network in a two-dimensional torus topology.

Programmability and Performance of Parallel ECS-Based Simulation of Multi-agent Exploration Models

Lecture Notes in Computer Science, 2014

A Framework for High Performance Simulation of Transactional Data Grid Platforms

Proceedings of the Sixth International Conference on Simulation Tools and Techniques, 2013

2013 IEEE 12th International Symposium on Network Computing and Applications, 2013

Lecture Notes in Computer Science, 2012

In this article we present SCORe, a scalable one-copy serializable partial replication protocol. ... more In this article we present SCORe, a scalable one-copy serializable partial replication protocol. Differently from any other literature proposal, SCORe jointly guarantees the following properties: (i) it is genuine, thus ensuring that only the replicas that maintain data accessed by a transaction are involved in its processing, and (ii) it guarantees that read operations always access consistent snapshots, thanks to a one-copy serializable multiversion scheme, which never aborts read-only transactions and spares them from any (distributed) validation phase. This makes SCORe particularly efficient in presence of read-intensive workloads, as typical of a wide range of real-world applications. We have integrated SCORe into a popular open source distributed data grid and performed a large scale experimental study with well-known benchmarks using both private and public cloud infrastructures. The experimental results demonstrate that SCORe provides stronger consistency guarantees (namely One-Copy Serializability) than existing multiversion partial replication protocols at no additional overhead.

Proceedings of the IEEE Symposium on Reliable Distributed Systems, 2012

This paper introduces SPECULA, a novel replication protocol for Software Transactional Memory sys... more This paper introduces SPECULA, a novel replication protocol for Software Transactional Memory systems that seeks maximum overlapping between the phases transaction processing and replica synchronization via speculative processing techniques. By removing the execution of the replica synchronization phase from the critical path of execution of transactions, SPECULA allows threads for pipelining the execution of speculatively executed transactional and/or non-transactional code. The core of SPECULA is a multi-version concurrency control algorithm that supports speculative transaction processing while ensuring the strong consistency criteria (analogous to opacity) that are desirable in non sand-boxed environments like STMs. Via an extensive experimental study, based on a fully-fledged prototype and on both synthetic and standard STM benchmarks, we demonstrated that SPECULA can achieve speed-ups of up to one order of magnitude with respect to state-of-the-art nonspeculative replication techniques.

2011 IEEE 30th International Symposium on Reliable Distributed Systems, 2011

In this work we present OSARE, an active replication protocol for transactional systems that comb... more In this work we present OSARE, an active replication protocol for transactional systems that combines the usage of Optimistic Atomic Broadcast with a speculative concurrency control mechanism in order to overlap transaction processing and replica synchronization. OSARE biases the speculative serialization of transactions towards an order aligned with the optimistic message delivery order. However, due to the lock-free nature of its concurrency control algorithm, at high concurrency levels, namely when the probability of mismatches between optimistic and final deliveries is higher, OSARE explores additional alternative transaction serialization orders in a lightweight and opportunistic fashion. A simulation study we carried out in the context of Software Transactional Memory systems shows that OSARE achieves robust performance also in scenarios characterized by non-minimal likelihood of reorder between optimistic and final deliveries, providing remarkable speed-up with respect to state of the art speculative replication protocols.

Load sharing for optimistic parallel simulations on multi core machines

ACM SIGMETRICS Performance Evaluation Review, 2012

Parallel Discrete Event Simulation (PDES) is based on the partitioning of the simulation model in... more Parallel Discrete Event Simulation (PDES) is based on the partitioning of the simulation model into distinct Logical Processes (LPs), each one modeling a portion of the entire system, which are allowed to execute simulation events concurrently. This allows exploiting parallel computing architectures to speedup model execution, and to make very large models tractable. In this article we cope with the optimistic approach to PDES, where LPs are allowed to concurrently process their events in a speculative fashion, and rollback/ recovery techniques are used to guarantee state consistency in case of causality violations along the speculative execution path. Particularly, we present an innovative load sharing approach targeted at optimizing resource usage for fruitful simulation work when running an optimistic PDES environment on top of multi-processor/multi-core machines. Beyond providing the load sharing model, we also define a load sharing oriented architectural scheme, based on a symm...

Transparent Support for Partial Rollback in Software Transactional Memories

Lecture Notes in Computer Science, 2013

Consistent and efficient output-streams management in optimistic simulation platforms

Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation - SIGSIM-PADS '13, 2013

The ROme OpTimistic Simulator: Core Internals and Programming Model

Proceedings of the 4th International ICST Conference on Simulation Tools and Techniques, 2011