Paolo Romano | Instituto Superior Técnico (original) (raw)

Papers by Paolo Romano

File and Storage Technologies, 2021

2017 IEEE 16th International Symposium on Network Computing and Applications (NCA)

State-machine replication (SMR) is a fundamental technique to implement fault-tolerant services. ... more State-machine replication (SMR) is a fundamental technique to implement fault-tolerant services. Recently, various works have aimed at enhancing the scalability of SMR by exploiting partial replication techniques. By sharding the state machine across disjoint partitions, and replicating each partition over independent groups of processes, a Partially Replicated State Machine (PRSM) can process operations that involve a single partition by only requiring synchronization among the replicas of that partition-achieving higher scalability than SMR. Unfortunately, though, existing PRSM rely on inefficient mechanisms to coordinate the execution of multi-partition operations, which either impose global coordination across all nodes in the system or require inter-partition synchronization on the critical path of execution of operations. As such, performance and scalability of existing PRSM systems is severely hindered in the presence of even a small fraction of multi-partition operations. This paper tackles this issue by presenting Genepi, a PRSM protocol that introduces a novel, highly efficient mechanism for regulating the execution of multi-partition operations. We show via an experimental evaluation based on both synthetic benchmarks and TPC-C that Genepi can achieve up to 5.5× of throughput gain over existing PRSM systems, with only negligible latency overhead at low load.

Proceedings of the 2017 Symposium on Cloud Computing

Information systems → Distributed database transactions; Storage replication; Online services are... more Information systems → Distributed database transactions; Storage replication; Online services are often deployed over geographically-scattered data centers (geo-replication), which allows services to be highly available and reduces access latency. On the down side, to provide ACID transactions, global certification (i.e., across data centers) is needed to detect conflicts between concurrent transactions executing at different data centers. The global certification phase reduces throughput because transactions need to hold pre-commit locks, and it increases client-perceived latency because global certification lies in the critical path of transaction execution.

2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2018

This paper addresses the problem of self-tuning the parallelism degree in Transactional Memory (T... more This paper addresses the problem of self-tuning the parallelism degree in Transactional Memory (TM) systems that support parallel nesting (PN-TM). This problem has been long investigated for TMs not supporting nesting, but, to the best of our knowledge, has never been studied in the context of PN-TMs. Indeed, the problem complexity is inherently exacerbated in PN-TMs, since these require to identify the optimal parallelism degree not only for top-level transactions but also for nested subtransactions. The increase of the problem dimensionality raises new challenges (e.g., increase of the search space, and proneness to suffer from local maxima), which are unsatisfactorily addressed by self-tuning solutions conceived for flat nesting TMs. We tackle these challenges by proposing AUTOPN, an online self-tuning system that combines model-driven learning techniques with localized search heuristics in order to pursue a twofold goal: i) enhance convergence speed by identifying the most promising region of the search space via model-driven techniques, while ii) increasing robustness against modeling errors, via a final local search phase aimed at refining the model's prediction. We further address the problem of tuning the duration of the monitoring windows used to collect feedback on the system's performance, by introducing novel, domain-specific, mechanisms aimed to strike an optimal trade-off between latency and accuracy of the self-tuning process. We integrated AUTOPN with a state of the art PN-TM (JVSTM) and evaluated it via an extensive experimental study. The results of this study highlight that AUTOPN can achieve gains of up to 45× in terms of increased accuracy and 4× faster convergence speed, when compared with several on-line optimization techniques (gradient descent, simulated annealing and genetic algorithm), some of which were already successfully used in the context of flat nesting TMs.

Abstract. The notion of permissiveness in Transactional Memory (TM) trans-lates to only aborting ... more Abstract. The notion of permissiveness in Transactional Memory (TM) trans-lates to only aborting a transaction when it cannot be accepted in any history that guarantees a target correctness criterion. Achieving permissiveness, however, comes at a non-negligible cost. This desirable property is often neglected by state of the art TMs, which, in order to maximize implementation’s efficiency, resort to aborting transactions under overly conservative conditions. We identify a novel sweet spot between permissiveness and efficiency by introducing the Time-Warp Multi-version algorithm (TWM), which allows for drastically minimizing spuri-ous aborts with respect to state of the art, highly efficient TMs, while introducing minimal bookkeeping overheads. Further, read-only transactions are abort-free, and both Virtual World Consistency and lock-freedom are ensured. 1 Overview of Time-Warping Typical MVCC algorithms for TM allow read-only transactions to be serialized “in the past”, i.e., befor...

Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, 2018

This work presents Speculative Transaction Replication (STR), a protocol that exploits transparen... more This work presents Speculative Transaction Replication (STR), a protocol that exploits transparent speculation techniques to enhance performance of geo-distributed, partially replicated transactional data stores. In addition, we define a new consistency model, Speculative Snapshot Isolation (SPSI), that extends the semantics of Snapshot Isolation (SI) to shelter applications from the subtle anomalies that can arise from using speculative transaction processing. SPSI extends SI in an intuitive and rigorous fashion by specifying desirable atomicity and isolation guarantees that must hold when using speculative execution. STR provides a form of speculation that is fully transparent for programmers (it does not expose the effects of misspeculations to clients). Since the speculation techniques employed by STR satisfy SPSI, they can be leveraged by application programs in a transparent way, without requiring any source-code modification to applications designed to operate using SI. STR combines two key techniques: speculative reads, which allow transactions to observe pre-committed versions, which can reduce the 'effective duration' of pre-commit locks and enhance throughput; Precise Clocks, a novel timestamping mechanism that uses per-item timestamps with physical clocks, which together greatly enhance the probability of successful speculation. We assess STR's performance on up to nine geo-distributed Amazon EC2 data centers, using both synthetic benchmarks as well as realistic benchmarks (TPC-C and RUBiS). Our evaluation shows that STR achieves throughput gains up to 11× and latency reduction up to 10×, in workloads characterized by low inter-data center contention. Furthermore, thanks to a self-tuning mechanism that dynamically and transparently enables and disables speculation, STR offers robust performance even when faced with unfavourable workloads that suffer from high misspeculation rates.

Euro-Par 2020: Parallel Processing, 2020

Non-Volatile Memory (NVM) is an emerging memory technology aimed to eliminate the gap between mai... more Non-Volatile Memory (NVM) is an emerging memory technology aimed to eliminate the gap between main memory and stable storage. Nevertheless, today's programs will not readily benefit from NVM because crash failures may render the program in an unrecoverable and inconsistent state. In this context, durable transactions have been proposed as a mechanism to ease the adoption of NVM by simplifying the task of programming NVM systems. Existing systems employ either hardware (HW) or software (SW) transactions with different performance tradeoffs. Although SW transactions are flexible and unbounded, they may significantly hurt the performance of short-lived transactions. On the other hand, HW transactional memories provide low-overhead but are resource-constrained. In this paper we present NV-PhTM, a transactional system for NVM that delivers the best out of both HW and SW transactions by dynamically selecting the best execution mode according to the application's characteristics. NV-PhTM is comprised of a set of heuristics to guide online phase transition while retaining persistency in case of crashes during migration. To the best of our knowledge, NV-PhTM is the first phase-based system to provide durable transactions. Experimental results with the STAMP benchmark in a simulated NVM environment show that the proposed heuristics are efficient in guiding phase transitions with low overhead.

ACM Computing Surveys, 2022

The recent rise of byte-addressable non-volatile memory technologies is blurring the dichotomy be... more The recent rise of byte-addressable non-volatile memory technologies is blurring the dichotomy between memory and storage. In particular, they allow programmers to have direct access to persistent data instead of relying on traditional interfaces, such as file and database systems. However, they also bring new challenges, as a failure may render the program in an unrecoverable and inconsistent state. Consequently, a lot of effort has been put by both industry and academia into making the task of programming with such memories easier while, at the same time, efficient from the runtime perspective. This survey summarizes such a body of research, from the abstractions to the implementation level. As persistent memory is starting to appear commercially, the state-of-the-art research condensed here will help investigators to quickly stay up to date while also motivating others to pursue research in the field.

2016 45th International Conference on Parallel Processing (ICPP), 2016

This work investigates how to combine two powerful abstractions to manage concurrent programming:... more This work investigates how to combine two powerful abstractions to manage concurrent programming: Transactional Memory (TM) and futures. The former hides from programmers the complexity of synchronizing concurrent access to shared data, via the familiar abstraction of atomic transactions. The latter serves to schedule and synchronize the parallel execution of computations whose results are not immediately required. While TM and futures are two widely investigated topics, the problem of how to exploit these two abstractions in synergy is still largely unexplored in the literature. This paper fills this gap by introducing Java Transactional Futures (JTF), a Java-based TM implementation that allows programmers to use futures to coordinate the execution of parallel tasks, while leveraging transactions to synchronize accesses to shared data. JTF provides a simple and intuitive semantic regarding the admissible serialization orders of the futures spawned by transactions, by ensuring that the results produced by a future are always consistent with those that one would obtain by executing the future sequentially. Our experimental results show that the use of futures in a TM allows not only to unlock parallelism within transactions, but also to reduce the cost of conflicts among top-level transactions in high contention workloads.

Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering, 2015

Classical approaches to performance prediction of computer systems rely on two, typically antithe... more Classical approaches to performance prediction of computer systems rely on two, typically antithetic, techniques: Machine Learning (ML) and Analytical Modeling (AM). ML undertakes a black-box approach, which typically achieves very good accuracy in regions of the features' space that have been sufficiently explored during the training process, but that has very weak extrapolation power (i.e., poor accuracy in regions for which none, or too few samples are known). Conversely, AM relies on a white-box approach, whose key advantage is that it requires no or minimal training, hence supporting prompt instantiation of the target system's performance model. However, to ensure their tractability, AMbased performance predictors typically rely on simplifying assumptions. Consequently, AM's accuracy is challenged in scenarios not matching such assumptions. This tutorial describes techniques that exploit AM and ML in synergy in order to get the best of the two worlds. It surveys several such hybrid techniques and presents use cases spanning a wide range of application domains.

Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering, 2015

Classical approaches to performance prediction rely on two, typically antithetic, techniques: Mac... more Classical approaches to performance prediction rely on two, typically antithetic, techniques: Machine Learning (ML) and Analytical Modeling (AM). ML takes a black box approach, whose accuracy strongly depends on the representativeness of the dataset used during the initial training phase. Specifically, it can achieve very good accuracy in areas of the features' space that have been sufficiently explored during the training process. Conversely, AM techniques require no or minimal training, hence exhibiting the potential for supporting prompt instantiation of the performance model of the target system. However, in order to ensure their tractability, they typically rely on a set of simplifying assumptions. Consequently, AM's accuracy can be seriously challenged in scenarios (e.g., workload conditions) in which such assumptions are not matched. In this paper we explore several hybrid/gray box techniques that exploit AM and ML in synergy in order to get the best of the two worlds. We evaluate the proposed techniques in case studies targeting two complex and widely adopted middleware systems: a NoSQL distributed key-value store and a Total Order Broadcast (TOB) service.

Lecture Notes in Computer Science, 2015

Acknowledgments: Euro-TM is an Action supported by the COST Association that has gathered researc... more Acknowledgments: Euro-TM is an Action supported by the COST Association that has gathered researchers from 17 European countries and over 40 institutions. All the authors of the book have actively participated in the activities of Euro-TM. The editors are grateful to the COST programme for supporting the Euro-TM initiative (IC1001-Transactional Memories: Foundations, Algorithms, Tools, and Applications), a forum that was fundamental for the preparation of this book. The editors also express their gratitude to all authors, for their enthusiastic and meticulous cooperation in the preparation of this book. A special thank goes to Maria Couceiro for her valuable support with the editing of the book. This publication is supported by COST.

This year, the 6th edition of the Workshop on Theory of Transactional Memory (WTTM) was collocate... more This year, the 6th edition of the Workshop on Theory of Transactional Memory (WTTM) was collocated with PODC 2014 in Paris, and took place on July 14. The objective of WTTM was to discuss new theoretical challenges and recent achievements in the area of transactional computing. Among the various recent developments in the area of Transactional Memory (TM), one of the most relevant was the support for Hardware TM (HTM), which was introduced in various commercial processors. Unsurprisingly, the recent advent of HTM in commercial CPUs has had a major impact also in the program of this edition of WTTM, which has gathered several works addressing issues related to the programmability, efficiency, and correctness of HTM-based systems, as well as hybrid solutions combining software and hardware TM implementations (HyTM). As in its previous editions, WTTM could count on the generous support of the EuroTM COST Action (IC1001), and on a set of outstanding keynote talks which were delivered by some of the leading researchers in the area, namely Idit Keidar, Shlomi Dolev, Maged Michael and Michael Scott, who were invited to present their latest achievements. This edition was dedicated to the 60th birthday of Maurice Herlihy and to his foundational work on Transactional Memory, which was commemorated by Michael Scott in the concluding talk of the event. This report is intended to give the highlights of the problems discussed during the workshop. Transactional Memory (TM) is a concurrency control mechanism for synchronizing concurrent accesses to shared memory by different threads. It has been proposed as an alternative to lock-based synchronization to simplify concurrent programming while exhibiting good performance. The sequential code is encapsulated in transactions, which are sequences of accesses to shared or local variables that should be executed atomically. A transaction ends either by committing, in which case all of its updates take effect, or by aborting, in which case all its updates are discarded. 1 TM Correctness and Universal Constructions Idit Keidar opened the workshop with a talk presenting a joint work with Kfir Lev-Ari and Gregory Chockler on the characterization of correctness for shared data structures. The idea pursued in this work is to replace the classic and overly conservative read-set validation technique (which checks that all read variables have not changed since they were first read) with the verification of abstract conditions over the shared variables, called base conditions. Reading values that satisfy some base condition at every point in time implies correctness of read-only operations. The resulting correctness guarantee, however, is found not to be equivalent to linearizability, and can be captured through two new conditions: validity and regularity. The former requires that a read-only operation never reaches a state unreachable in a sequential execution; the latter generalizes Lamport's notion of regularity [17] for arbitrary data structures. An extended version of the work presented at WTTM has appeared also in the last edition of DISC [18]. Claire Capdevielle presented her joint work with Colette Johnen and Alessia Milani on solo-fast universal constructions for deterministic abortable objects, which are objects that ensure that, if several processes contend to operate on it, a special abort response may be returned. Such a response indicates that the operation failed and guarantees that an aborted operation does not take effect [13]. Operations that do not abort return a response which is legal with respect to the sequential specification of the object. The work presented uses only read/write registers when there is no contention and stronger synchronization primitives, e.g., CAS, when contention occurs [3]. They propose a construction with a lightweight helping mechanism that applies to objects that can return an abort event to indicate the failure of an operation. Sandeep Hans presented a joint work with Hagit Attiya, Alexey Gotsman, and Noam Rinetzky on an evaluation of TMS1 as a consistency criterion necessary and sufficient for the case where local variables are rolled-back upon transaction aborts [2]. The authors claim that TMS [9] is not trivially formulated. In particular, this formulation allows aborted and live transactions to have different views of the system state. Their proof reveals some natural, but subtle, assumptions on the TM required for the equivalence result.

ACM SIGMETRICS Performance Evaluation Review, 2014

The advent of multi-core architectures has brought concurrent programming to the forefront of sof... more The advent of multi-core architectures has brought concurrent programming to the forefront of software development. In this context, Transactional Memory (TM) has gained increasing popularity as a simpler, attractive alternative to traditional lock-based synchronization. The recent integration of Hardware TM (HTM) in the last generation of Intel commodity processors turned TM into a mainstream technology, raising a number of questions on its future and that of concurrent programming. To evaluate the potential impact of Intel's HTM, we conducted the largest study on TM to date, comparing different locking techniques, hardware and software TMs, as well as different combinations of these mechanisms, from the dual perspective of performance and power consumption. As a result we perform a workload characterization, to help programmers better exploit the currently available TM facilities, and identify important research directions.

ACM SIGAPP Applied Computing Review, 2014

Hyperspace hashing is a recent multi-dimensional indexing technique for distributed key-value sto... more Hyperspace hashing is a recent multi-dimensional indexing technique for distributed key-value stores that aims at supporting efficient queries using multiple objects' attributes. However, the advantage of supporting complex queries comes at the cost of a complex configuration. In this paper we address the problem of automating the configuration of this innovative distributed indexing mechanism. We first show that a misconfiguration may significantly affect the performance of the system. We then derive a performance model that provides key insights on the behaviour of hyperspace hashing. Based on this model, we derive a technique to automatically and dynamically select the best configuration.

Lecture Notes in Computer Science, 2014

Transactional Memory (TM) is an emerging paradigm that promises to ease the development of parall... more Transactional Memory (TM) is an emerging paradigm that promises to ease the development of parallel applications. Due to its inherently speculative nature, however, TM can suffer of performance degradations in presence of conflict intensive workloads. A key technique to tackle this issue consists in dynamically regulating the number of concurrent threads, which allows for selecting the concurrency level that best fits the intrinsic parallelism of specific applications. In this area, several selftuning approaches have been proposed for Software-based implementations of TM (STM). In this paper we investigate the effectiveness of these techniques when applied to Hardware TM (HTM), a theme that is particularly relevant and timely given the recent integration of hardware supports for TM in next generation of mainstream Intel processors. Our study, conducted on Intel's implementation of HTM, identifies several issues associated with the employment of techniques originally conceived for STM. Motivated by these findings, we propose an innovative machine learning based technique explicitly designed to take into account peculiarities of HTM systems, and demonstrate its advantages, in terms of higher accuracy and shorter learning times, using the STAMP benchmark suite.

File and Storage Technologies, 2021

2017 IEEE 16th International Symposium on Network Computing and Applications (NCA)

Proceedings of the 2017 Symposium on Cloud Computing

2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2018

Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, 2018

Euro-Par 2020: Parallel Processing, 2020

ACM Computing Surveys, 2022

2016 45th International Conference on Parallel Processing (ICPP), 2016

Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering, 2015

Lecture Notes in Computer Science, 2015

ACM SIGMETRICS Performance Evaluation Review, 2014

ACM SIGAPP Applied Computing Review, 2014

Lecture Notes in Computer Science, 2014