Making the fast case common and the uncommon case simple in unbounded transactional memory (original) (raw)
Related papers
Unrestricted transactional memory: Supporting I/O and system calls within transactions
2006
Abstract Hardware transactional memory has great potential to simplify the creation of correct and efficient multithreaded programs, enabling programmers to exploit the soon-to-be-ubiquitous multi-core designs. Transactions are simply segments of code that are guaranteed to execute without interference from other concurrently-executing threads. The hardware executes transactions in parallel, ensuring non-interference via abort/rollback/restart when conflicts are detected.
Mechanisms for unbounded, conflict-robust hardware transactional memory
2010
With shared-memory multiprocessing becoming the norm in contexts ranging from webservers to mobile devices, the task of developing high-performance parallel programs is being faced by more programmers than ever before. One key challenge in developing such programs is the need to synchronize accesses to shared memory made by different threads. Implementing synchronization that is both (1) correct and (2) not a performance bottleneck has historically been a challenging task.
EcoTM: Conflict-aware Economical Unbounded Hardware Transactional Memory
Procedia Computer Science, 2013
Transactional Memory (TM) is a promising paradigm for parallel programming. TM allows a thread to make a series of memory accesses as a single, atomic, transaction, while avoiding deadlocks, livelocks, and other problems commonly associated with lock-based programming. In this paper we explore Hardware support for TM (HTM). In particular, we explore how HTM can efficiently support transactions of nearly unlimited size.
A Practical Transactional Memory Interface
Lecture Notes in Computer Science, 2015
Hardware transactional memory (HTM) is becoming widely available on modern platforms. However, software using HTM requires at least two carefully-coordinated code paths: one for transactions, and at least one for when transactions either fail, or are not supported at all. We present the MCMS interface that allows a simple design of fast concurrent data structures. MCMS-based code can execute fast when HTM support is provided, but it also executes well on platforms that do not support HTM, and it handles transaction failures as well. To demonstrate the advantage of such an abstraction, we designed MCMS-based linked-list and tree algorithms. The list algorithm outperforms all known lock-free linked-lists by a factor of up to X2.15. The tree algorithm builds on Ellen et al. [7] and outperforms it by a factor of up to X1.37. Both algorithms are considerably simpler than their lock-free counterparts.
2006
High performance parallel programs are currently difficult to write and debug. One major source of difficulty is protecting concurrent accesses to shared data with an appropriate synchronization mechanism. Locks are the most common mechanism but they have a number of disadvantages, including possibly unnecessary serialization, and possible deadlock. Transactional memory is an alternative mechanism that makes parallel programming easier. With transactional memory, a transaction provides atomic and serializable operations on an arbitrary set of memory locations. When a transaction commits, all operations within the transaction become visible to other threads. When it aborts, all operations in the transaction are rolled back.
On the Cost of Concurrency in Transactional Memory
Lecture Notes in Computer Science, 2011
The crux of software transactional memory (STM) is to combine an easy-to-use programming interface with an efficient utilization of the concurrent computing abilities provided by modern machines. But does this combination come with an inherent cost?
Dependence-aware transactional memory for increased concurrency
2008
Transactional memory (TM) is a promising paradigm for helping programmers take advantage of emerging multicore platforms. Though they perform well under low contention, hardware TM systems have a reputation of not performing well under high contention, as compared to locks. This paper presents a model and implementation of dependence-aware transactional memory (DATM), a novel solution to the problem of scaling under contention. Unlike many proposals to deal with write-shared data (which arise in common data structures like counters and linked lists), DATM operates transparently to the programmer.
A scalable, non-blocking approach to transactional memory
… , 2007. HPCA 2007. …, 2007
Transactional Memory (TM) provides mechanisms that promise to simplify parallel programming by eliminating the need for locks and their associated problems (deadlock, livelock, priority inversion, convoying). For TM to be adopted in the long term, not only does it need to deliver on these promises, but it needs to scale to a high number of processors. To date, proposals for scalable TM have relegated livelock issues to user-level contention managers. This paper presents the first scalable TM implementation for directory-based distributed shared memory systems that is livelock free without the need for user-level intervention. The design is a scalable implementation of optimistic concurrency control that supports parallel commits with a two-phase commit protocol, uses write-back caches, and filters coherence messages. The scalable design is based on Transactional Coherence and Consistency (TCC), which supports continuous transactions and fault isolation. A performance evaluation of the design using both scientific and enterprise benchmarks demonstrates that the directorybased TCC design scales efficiently for NUMA systems up to 64 processors.
Distributed Computing, 1997
As we learn from the literature, flexibility in choosing synchronization operations greatly simplifies the task of designing highly concurrent programs. Unfortunately, existing hardware is inflexible and is at best on the level of a Load–Linked/Store–Conditional operation on a single word. Building on the hardware based transactional synchronization methodology of Herlihy and Moss, we offer software transactional memory (STM), a novel software method for supporting flexible transactional programming of synchronization operations. STM is non-blocking, and can be implemented on existing machines using only a Load–Linked/Store–Conditional operation. We use STM to provide a general highly concurrent method for translating sequential object implementations to non-blocking ones based on implementing a k-word compare&swap STM-transaction. Empirical evidence collected on simulated multiprocessor architectures shows that our method always outperforms the non-blocking translation methods in the style of Barnes, and outperforms Herlihy’s translation method for sufficiently large numbers of processors. The key to the efficiency of our software-transactional approach is that unlike Barnes style methods, it is not based on a costly “recursive helping” policy.
Making the fast case common and the uncommon case simple in unbounded transactional memory
Hardware transactional memory has great potential to simplify the creation of correct and efficient multithreaded programs, allowing programmers to exploit more effectively the soon-to-be-ubiquitous multi-core designs. Several recent proposals have extended the original bounded transactional memory to unbounded transactional memory, a crucial step toward transactions becoming a generalpurpose primitive. Unfortunately, supporting the concurrent execution of an unbounded number of unbounded transactions is challenging, and as a result, many proposed implementations are complex.