Transactional coherence and consistency: simplifying parallel hardware and software (original) (raw)

Transactional coherence and consistency

2004

With uniprocessor systems ru n n i n g into instru c t i o n- l e vel parallelism (ILP) limits and fundamental VLSI constraints, parallel a rc h i t e c t u res provide a realistic path tow a rd scalable performance by letting pro g r a m m e r s exploit thre a d- l e vel parallelism (TLP) in more explicitly distributed arc h i t e c t u res. As a re s u l t, s i n g l e- b o a rd and single-chip multipro c e s s o r s a re becoming the norm for server and embedded computing, and are even starting to appear on desktop platforms. Ne ve rtheless, the complexity of parallel application development and the continued difficulty of implementing efficient and correct parallel arc h i t e c t u res have limited these arc h i t e c t u re s ’ potential.

Transactional Memory Coherence and Consistency

ACM SIGARCH Computer Architecture News, 2004

In this paper, we propos a new shared memory model: Transactionalmemory Coherence and Consistency (TCC).TCC providesa model in which atomic transactions are always the basicunit of parallel work, communication, memory coherence, andmemory reference consistency.TCC greatly simplifies parallelsoftware by eliminating the need for synchronization using conventionallocks and semaphores, along with their complexities.TCC hardware must combine all writes from each transaction regionin a program into a single packet and broadcast this packetto the permanent shared memory state atomically as a large block.This simplifies the coherence hardware because it reduces theneed for small, low-latency messages and completely eliminatesthe need for conventional snoopy cache coherence protocols, asmultiple speculatively written versions of a cache line may safelycoexist within the system.Meanwhile, automatic, hardware-controlledrollback of speculative transactions resolves any correctnessviolations tha...

Programming with transactional coherence and consistency (TCC)

ACM SIGARCH Computer Architecture News, 2004

Transactional Coherence and Consistency (TCC) offers a way to simplify parallel programming by executing all code within transactions. In TCC systems, transactions serve as the fundamental unit of parallel work, communication and coherence. As each transaction completes, it writes all of its newly produced state to shared memory atomically, while restarting other processors that have speculatively read stale data. With this mechanism, a TCCbased system automatically handles data synchronization correctly, without programmer intervention. To gain the benefits of TCC, programs must be decomposed into transactions. We describe two basic programming language constructs for decomposing programs into transactions, a loop conversion syntax and a general transaction-forking mechanism. With these constructs, writing correct parallel programs requires only small, incremental changes to correct sequential programs. The performance of these programs may then easily be optimized, based on feedback from real program execution, using a few simple techniques.

Characterization of TCC on Chip-Multiprocessors

2005

Transactional Coherence and Consistency (TCC) is a novel coherence scheme for shared memory multiprocessors that uses programmer-defined transactions as the fundamental unit of parallel work, synchronization, coherence, and consistency. TCC has the potential to simplify parallel program development and optimization by providing a smooth transition from sequential to parallel programs.

Token Coherence for Transactional Memory

2008

Concurrent programming holds the key to fully utilizing the multi-core chips provided by CMPs. However, traditional concurrent programming techniques based on locking mechanisms are hard to code and error-prone. Transactional Programming is an attempt to simplify concurrent programming where transactions are the main concurrent construct. Hardware Transactional Memory systems provide hardware support for executing transactions and take help from modified versions of standard cache coherence protocols for doing so. This report presents a token coherence protocol for the transactional memory system, TTM [6]. It also presents some comparisons between standard locking based systems and transactional memory systems. It has been observed that the performance of transactional memory systems is workload-dependent. However, transactional programming appears to be much easier than programming with locks.

Unrestricted transactional memory: Supporting I/O and system calls within transactions

2006

Abstract Hardware transactional memory has great potential to simplify the creation of correct and efficient multithreaded programs, enabling programmers to exploit the soon-to-be-ubiquitous multi-core designs. Transactions are simply segments of code that are guaranteed to execute without interference from other concurrently-executing threads. The hardware executes transactions in parallel, ensuring non-interference via abort/rollback/restart when conflicts are detected.

The Software Stack for Transactional Memory Challenges and Opportunities

2006

In 1978 C.A.R Hoare wrote, "developments of processor technology suggest that a multiprocessor machine [...] may become more powerful, capacious, reliable, and economical than a machine which is disguised as a monoprocessor" [11]. Over a quarter of a century later, even laptop computers are multiprocessors thanks to the arrival of multi-core processors. Although there has been progress in the intervening decades, there is still debate about how to best create software to take advantage of multiprocessor systems. The most widely used mechanism for explicitly managing parallelism in programs is locks, whether it be through Pthread mutexes in C or C++ or synchronizing on objects in Java. However, the traditional problems of locks such as priority inversion, convoying, and deadlock, have led to the development of alternatives such as non-blocking data structures as well as the use of ACID transactions. Although locks may be the most common method for managing parallelism explicitly, more programmers probably use parallel systems implicitly through the use of transactional database systems. We believe that there is much to be gained from the transactional alternative to locking. We believe that transactions make it easier to write correct parallel programs and, just as important, to write parallel programs that have good performance. Our beliefs are not just based on our own experience with the programming model provided by transactional memory, but also on the experience of the database community. To give a seasoned opinion from that community, here is a quote from Jim Gray: There are many examples of systems that tried and failed to implement fault-tolerant or distributed computations using ad hoc techniques rather than a transaction concept. Subsequently, some of these systems were successfully implemented using transaction techniques. After the fact, the implementers confessed that they simply hit a complexity barrier and could not debug the ad hoc system without a simple unifying concept to deal with exceptions [Borr 1986; Hagmann 1987]. Perhaps even more surprising, the subse-Copyright is held by the authors.

Transactional memory

Computer architecture news, 1993

A shared data structure is lock-free if its operations do not require mutual exclusion. If one process is interrupted in the middle of an operation, other processes will not be prevented from operating on that object. In highly concurrent systems, lock-free data structures avoid common problems associated with conventional locking techniques, including priority inversion, convoying, and difficulty of avoiding deadlock. This paper introduces transactional memory, a new multiprocessor architecture intended to make lock-free synchronization as efficient (and easy to use) as conventional techniques based on mutual exclusion. Transactional memory allows programmers to define customized read-modify-write operations that apply to multiple, independently-chosen words of memory. It is implemented by straightforward extensions to any multiprocessor cache-coherence protocol. Simulation results show that transactional memory matches or outperforms the best known locking techniques for simple benchmarks, even in the absence of priority inversion, convoying, and deadlock.