A Practical Transactional Memory Interface (original) (raw)

Transactional memory: Architectural support for lock-free data structures

ACM SIGARCH Computer Architecture News, 1993

A shared data structure is lock-free if its operations do not require mutual exclusion. If one process is interrupted in the middle of an operation, other processes will not be prevented from operating on that object. In highly concurrent systems, lock-free data structures avoid common problems associated with conventional locking techniques, including priority inversion, convoying, and difficulty of avoiding deadlock. This paper introduces transactional memory, a new multiprocessor architecture intended to make lock ...

Hybrid transactional memory

2006

High performance parallel programs are currently difficult to write and debug. One major source of difficulty is protecting concurrent accesses to shared data with an appropriate synchronization mechanism. Locks are the most common mechanism but they have a number of disadvantages, including possibly unnecessary serialization, and possible deadlock. Transactional memory is an alternative mechanism that makes parallel programming easier. With transactional memory, a transaction provides atomic and serializable operations on an arbitrary set of memory locations. When a transaction commits, all operations within the transaction become visible to other threads. When it aborts, all operations in the transaction are rolled back.

A scalable, non-blocking approach to transactional memory

… , 2007. HPCA 2007. …, 2007

Transactional Memory (TM) provides mechanisms that promise to simplify parallel programming by eliminating the need for locks and their associated problems (deadlock, livelock, priority inversion, convoying). For TM to be adopted in the long term, not only does it need to deliver on these promises, but it needs to scale to a high number of processors. To date, proposals for scalable TM have relegated livelock issues to user-level contention managers. This paper presents the first scalable TM implementation for directory-based distributed shared memory systems that is livelock free without the need for user-level intervention. The design is a scalable implementation of optimistic concurrency control that supports parallel commits with a two-phase commit protocol, uses write-back caches, and filters coherence messages. The scalable design is based on Transactional Coherence and Consistency (TCC), which supports continuous transactions and fault isolation. A performance evaluation of the design using both scientific and enterprise benchmarks demonstrates that the directorybased TCC design scales efficiently for NUMA systems up to 64 processors.

The Software Stack for Transactional Memory Challenges and Opportunities

Transactionalmemorysystemsapplytheexperienceofthedatabase community to the general problem of parallel programming with the goal of providing a simple parallel programing model that de- livers on the performance potential of multi-processor systems. Although initial research into both software-only and hardware- supported transactional memory has shown promising results, there are many challenges to creating a fully transactional software stack. Although today's software stack has some limited use of trans- actional programming, many parts of the stack from basic data structures to the operating system and program runtimes contain at least some lock-based code. In code with coarse-grained lock- ing, transactions provide an opportunity to improve performance. In code with fine-grained lock, transactions provide an opportunity to simplify code while reducing overhead.

Simplifying concurrent algorithms by exploiting hardware transactional memory

Proceedings of the …, 2010

We explore the potential of hardware transactional memory (HTM) to improve concurrent algorithms. We illustrate a number of use cases in which HTM enables significantly simpler code to achieve similar or better performance than existing algorithms for conventional architectures. We use Sun's prototype multicore chip, codenamed Rock, to experiment with these algorithms, and discuss ways in which its limitations prevent better results, or would prevent production use of algorithms even if they are successful. Our use cases include concurrent data structures such as double ended queues, work stealing queues and scalable non-zero indicators, as well as a scalable malloc implementation and a simulated annealing application. We believe that our paper makes a compelling case that HTM has substantial potential to make effective concurrent programming easier, and that we have made valuable contributions in guiding designers of future HTM features to exploit this potential.

Unrestricted transactional memory: Supporting I/O and system calls within transactions

2006

Abstract Hardware transactional memory has great potential to simplify the creation of correct and efficient multithreaded programs, enabling programmers to exploit the soon-to-be-ubiquitous multi-core designs. Transactions are simply segments of code that are guaranteed to execute without interference from other concurrently-executing threads. The hardware executes transactions in parallel, ensuring non-interference via abort/rollback/restart when conflicts are detected.

On the Cost of Concurrency in Transactional Memory

Lecture Notes in Computer Science, 2011

The crux of software transactional memory (STM) is to combine an easy-to-use programming interface with an efficient utilization of the concurrent computing abilities provided by modern machines. But does this combination come with an inherent cost?

Software Transactional Memory

Distributed Computing, 1997

As we learn from the literature, flexibility in choosing synchronization operations greatly simplifies the task of designing highly concurrent programs. Unfortunately, existing hardware is inflexible and is at best on the level of a Load–Linked/Store–Conditional operation on a single word. Building on the hardware based transactional synchronization methodology of Herlihy and Moss, we offer software transactional memory (STM), a novel software method for supporting flexible transactional programming of synchronization operations. STM is non-blocking, and can be implemented on existing machines using only a Load–Linked/Store–Conditional operation. We use STM to provide a general highly concurrent method for translating sequential object implementations to non-blocking ones based on implementing a k-word compare&swap STM-transaction. Empirical evidence collected on simulated multiprocessor architectures shows that our method always outperforms the non-blocking translation methods in the style of Barnes, and outperforms Herlihy’s translation method for sufficiently large numbers of processors. The key to the efficiency of our software-transactional approach is that unlike Barnes style methods, it is not based on a costly “recursive helping” policy.

Hybrid transactional memory to accelerate safe lock-based transactions

2008

To reduce the overhead of Software Transactional Memory (STM) there are many recent proposals to build hybrid systems that use architectural support either to accelerate parts of a particular STM algorithm (Ha-TM), or to form a hybrid system allowing hardwaretransactions and software-transactions to inter-operate in the same address space (Hy-TM). In this paper we introduce a Hy-TM design based on multi-reader, single-writer locking when a transaction tries to commit. This approach is the first Hy-TM to combine three desirable features: (i) execution whether or not the architectural support is present, (ii) execution of a single common code path, whether a transaction is running in software or hardware, (iii) immunity, for correctly synchronized programs, from the "privatization" problem. Our architectural support can be any traditional HTM supporting bounded or unbounded-size transactions, along with an instruction to test whether or not the current thread is running inside a hardware transaction. With this we carefully design the Hy-TM so that portions of its work can be elided when running a transaction in hardwaremode.

LogTM: Log-based Transactional Memory

Transactional memory (TM) simplifies parallel programming by guaranteeing that transactions appear to execute atomically and in isolation. Implementing these properties includes providing data version management for the simultaneous storage of both new (visible if the transaction commits) and old (retained if the transaction aborts) values. Most (hardware) TM systems leave old values "in place" (the target memory address) and buffer new values elsewhere until commit. This makes aborts fast, but penalizes (the much more frequent) commits. In this paper, we present a new implementation of transactional memory, Log-based Transactional Memory (LogTM), that makes commits fast by storing old values to a per-thread log in cacheable virtual memory and storing new values in place. LogTM makes two additional contributions. First, LogTM extends a MOESI directory protocol to enable both fast conflict detection on evicted blocks and fast commit (using lazy cleanup). Second, LogTM handles aborts in (library) software with little performance penalty. Evaluations running micro-and SPLASH-2 benchmarks on a 32way multiprocessor support our decision to optimize for commit by showing that only 1-2% of transactions abort.