Ting-Lu Huang - Academia.edu (original) (raw)
Papers by Ting-Lu Huang
A new lock synchronization algorithm, proposed independently by Craig and the authors, not only e... more A new lock synchronization algorithm, proposed independently by Craig and the authors, not only eliminates memory contention caused by process spinning but also preserves first in first out property. A previous result, the MCS lock algorithm, requires both compare and swap and fetch and store instructions, or the FIFO property is lost and hence starvation may occur. The new one requires only fetch and store. We provide an assertional proof for the new algorithm. Most of behavioral proofs of concurrent programs are error-prone since it is difficult and tedious to take all possibilities of interleaving among the processes into consideration. An assertional proof replaces a large number of possibilities of interleaving by a small number of invariants. New techniques in this proof are: an assertional characterization of token bit accessibility; the definition of effective assignments that brings about the notion of token creation/destruction; the definition of token count that derives the mutual exclusion theorem; and the constructing procedure of a token-list that faithfully records the arrival time sequence of lock requests so that FIFO ordering can be enforced.
Two fast mutual exclusion algorithms using read-modifywrite and atomic read/write registers are p... more Two fast mutual exclusion algorithms using read-modifywrite and atomic read/write registers are presented. The first one uses both compare&swap and fetch&store; the second uses only fetch&store. Fetch&store are more commonly available than compare&swap. It is impossible to obtain better algorithms if "time" is measured by counting remote memory references. We were able to maintain the same level of performance with or without the support of compare&swap. However, fairness is degraded from 1-bounded bypass to lockout freedom without the support.
Journal of Parallel and Distributed Computing, May 1, 1997
This paper establishes the necessary and sufficient condition for a correct clock resetting such ... more This paper establishes the necessary and sufficient condition for a correct clock resetting such that the functionality of vector clocks can be preserved. A clock reset protocol is presented with its applicability and limitation discussed. Our result indicates that for some applications, the potential of clock overflow can be completely prevented by carefully choosing the condition for initiating the clock reset protocol.
International Journal of Software Engineering and Knowledge Engineering, Dec 1, 1995
Concurrent programs are more difficult to test than sequential programs because of non-determinis... more Concurrent programs are more difficult to test than sequential programs because of non-deterministic behavior. An execution of a concurrent program non-deterministically exercises a sequence of synchronization events called a synchronization sequence (or SYN-sequence). Non-deterministic testing of a concurrent program P is to execute P with a given input many times in order to exercise distinct SYN-sequences. In this paper, we present a new testing approach called reachability testing. If every execution of P with input X terminates, reachability testing of P with input X derives and executes all possible SYN-sequences of P with input X. We show how to perform reachability testing of concurrent programs using read and write operations. Also, we present results of empirical studies comparing reachability and non-deterministic testing. Our results indicate that reachability testing has advantages over non-deterministic testing.
In distributed shared memory multiprocessors, remote memory accesses generate processor-tomemory ... more In distributed shared memory multiprocessors, remote memory accesses generate processor-tomemory traffic which may result in a bottleneck. It is therefore important to design algorithms that minimize the number of remote memory accesses. We establish a lower bound of 3 on remote access time complexity for mutual exclusion algorithms in a model where processes communicate by means of a general read-modify-write primitive. Since a general read-modify-write primitive is a generalization of all atomic primitives that access at most one shared variable, our lower bound holds for any set of such primitives. Furthermore, this lower bound is tight because it matches the upper bound of Huang's algorithm proposed in 1999.
Busy waiting is common in shared memory mutual exclusion algorithms. To reduce memory contention ... more Busy waiting is common in shared memory mutual exclusion algorithms. To reduce memory contention incurred by busy waiting, we follow the concept of local spin made popular by Mellor-Crummey and Scott and propose a generic approach for adding local spin to mutual exclusion algorithms of the atomic read/write model. Taking Eisenburg-McGuire algorithm as an example, two local spin versions were obtained. The first is an easy product of the generic approach. The second, with better inter-process communication made possible by an in-depth understanding of the algorithm, significantly reduces the number of remote memory accesses.
Execution efficiency of logic programs can be improved in two major directions: parallel processi... more Execution efficiency of logic programs can be improved in two major directions: parallel processing for more computation power and control guidance for less non-determinism. Parallel execution of a logic program represented in connection graph has to be guarded against the problems of logical inconsistency. Enforcing Bernstein conditions can prevent such problems but results in an unacceptable reduction of parallelism. A subcycle-level parallel procedure with step-wise purity deletions is designed to remedy such problems. The concurrent step-wise purity deletion has been shown to preserve much of the deletion power of the sequential purity deletion. Recursion is what makes a logic program non-trivial. Fact propagation is proposed to reduce the run-time recursive interation by a compile-time analysis of the recursive loops. Herbrand expansion tree provide a concise organization for the increasingly large number of unit clauses during the propagation. Symbolic execution through the lo...
Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003)
Two fast mutual exclusion algorithms using read-modifywrite and atomic read/write registers are p... more Two fast mutual exclusion algorithms using read-modifywrite and atomic read/write registers are presented. The first one uses both compare&swap and fetch&store; the second uses only fetch&store. Fetch&store are more commonly available than compare&swap. It is impossible to obtain better algorithms if "time" is measured by counting remote memory references. We were able to maintain the same level of performance with or without the support of compare&swap. However, fairness is degraded from 1-bounded bypass to lockout freedom without the support.
Proceedings Seventh International Conference on Parallel and Distributed Systems (Cat. No.PR00568)
Many nonblocking algorithms have been proposed for shared queues. Previous studies indicate that ... more Many nonblocking algorithms have been proposed for shared queues. Previous studies indicate that link-based algorithms perform best. However, these algorithms have a memory management problem: a dequeued node can not be freed or reused without proper handling. The problem is usually overlooked; one just assumes the existence of a lower level mechanism, which takes care of all the details of handling the problem. Employing such a mechanism incurs significant overheads, and consequently the link-based queues may not perform as well as claimed. A new nonblocking queue algorithm based on a finite array is proposed in this paper. Comparing with the link-based algorithms, the new algorithm provides the same degree of concurrency without being subject to the memory problem, hence suggests a good performance.
This paper presents a scalable scheme for ensuring causal ordering of messages passing among proc... more This paper presents a scalable scheme for ensuring causal ordering of messages passing among processes in large-scale distributed systems. Previously proposed approaches, categorized as centralized or fully distributed, either place the entire processing loads on a single process or incur quadratic message overheads in the number of participating processes. These solutions perform limitedly in large-scale systems. Our scheme organizes the entire system as hierarchical clusters in which any of the previously proposed approaches can be employed. Message causality is maintained by enforcing the rules by which messages are propagated from origins to destinations. This approach incurs much less processing load on hot-spot sites than the centralized approach, or, alternatively, requires a message space overhead much less than that in the fully distributed approach. We shall show that, by setting cluster size appropriately, the message space overhead can be only a linear or even a logarithmic function of the number of processes involved. Our approach is suitable for large non-proprietary networks.
Causal message ordering in the context of group communication ensures that all the message receiv... more Causal message ordering in the context of group communication ensures that all the message receivers observe consistent ordering of events affecting a group as a whole. This paper presents a scalable causal multicast protocol for mobile distributed computing systems. In our protocol, only a part of the mobility agents in the system is involved in group computations and the resulting size of control information in messages can be kept small. Our protocol can outperform qualitatively the counterparts in terms of communication overhead and handoff complexity. An analytical model is also developed to evaluate our proposal. The performance results show that the proposed protocol is promising.
11th International Conference on Parallel and Distributed Systems (ICPADS'05)
For shared memory systems with time and resource constraints such as embedded real-time systems, ... more For shared memory systems with time and resource constraints such as embedded real-time systems, mutual exclusion mechanism that is both fair and space-efficient can be very useful. In this paper, we present a boundedbypass algorithm using only two shared variables, regardless of the number of contending processes, by operation f etch&store as well as atomic read/write. To achieve the same level of fairness, we show that, by the same set of operations, two shared variables are necessary, and therefore our algorithm is space-optimal.
Proceedings 1998 International Conference on Parallel and Distributed Systems (Cat. No.98TB100250)
Three fast mutual exclusion algorithms using read-modify-write and atomic read/write registers ar... more Three fast mutual exclusion algorithms using read-modify-write and atomic read/write registers are presented in a sequence, with an improvement from one to the next. The last algorithm is shown to be optimal in minimizing the number of remote memory accesses required in a resource busy period. Remote memory access is the key factor of memory access bottleneck in large shared-memory multiprocessors. The algorithm is particularly suitable in such systems for applications with small critical sections and frequent resource requests.
Proceedings of 1994 International Conference on Parallel and Distributed Systems
A new lock synchronization algorithm, proposed independently by Craig and the authors, not only e... more A new lock synchronization algorithm, proposed independently by Craig and the authors, not only eliminates memory contention caused by process spinning but also preserves first in first out property. A previous result, the MCS lock algorithm, requires both compare and swap and fetch and store instructions, or the FIFO property is lost and hence starvation may occur. The new one requires only fetch and store. We provide an assertional proof for the new algorithm. Most of behavioral proofs of concurrent programs are error-prone since it is difficult and tedious to take all possibilities of interleaving among the processes into consideration. An assertional proof replaces a large number of possibilities of interleaving by a small number of invariants. New techniques in this proof are: an assertional characterization of token bit accessibility; the definition of effective assignments that brings about the notion of token creation/destruction; the definition of token count that derives the mutual exclusion theorem; and the constructing procedure of a token-list that faithfully records the arrival time sequence of lock requests so that FIFO ordering can be enforced.
International Conference on Information Networking, 1999
Causal multicast is required for several distributed applications. In a mobile computing environm... more Causal multicast is required for several distributed applications. In a mobile computing environment, it is especially important for applications that involve human interactions from several locations, for example, teleconferencing. In this paper, we present a causal multicast algorithm in which the message overhead is independent of the number of mobile hosts, and which can handle connections/disconnections easily. It also handles
IEEE Transactions on Parallel and Distributed Systems - TPDS, 1998
Journal of Parallel and Distributed Computing, 2006
In distributed shared memory multiprocessors, remote memory references generate processor-to-memo... more In distributed shared memory multiprocessors, remote memory references generate processor-to-memory traffic, which may result in a bottleneck. It is therefore important to design algorithms that minimize the number of remote memory references. We establish a lower bound of three on remote reference time complexity for mutual exclusion algorithms in a model where processes communicate by means of a general read-modify-write primitive that accesses at most one shared variable in one instruction. Since the general read-modify-write primitive is a generalization of a variety of atomic primitives that have been implemented in multiprocessor systems, our lower bound holds for all mutual exclusion algorithms that use such primitives. Furthermore, this lower bound is shown to be tight by presenting an algorithm with the matching upper bound.
Journal of Parallel and Distributed Computing, 1997
This paper establishes the necessary and sufficient condition for a correct clock resetting such ... more This paper establishes the necessary and sufficient condition for a correct clock resetting such that the functionality of vector clocks can be preserved. A clock reset protocol is presented with its applicability and limitation discussed. Our result indicates that for some applications, the potential of clock overflow can be completely prevented by carefully choosing the condition for initiating the clock reset protocol.
IEEE Transactions on Parallel and Distributed Systems, 1998
A new lock synchronization algorithm, proposed independently by Craig and the authors, not only e... more A new lock synchronization algorithm, proposed independently by Craig and the authors, not only eliminates memory contention caused by process spinning but also preserves first in first out property. A previous result, the MCS lock algorithm, requires both compare and swap and fetch and store instructions, or the FIFO property is lost and hence starvation may occur. The new one requires only fetch and store. We provide an assertional proof for the new algorithm. Most of behavioral proofs of concurrent programs are error-prone since it is difficult and tedious to take all possibilities of interleaving among the processes into consideration. An assertional proof replaces a large number of possibilities of interleaving by a small number of invariants. New techniques in this proof are: an assertional characterization of token bit accessibility; the definition of effective assignments that brings about the notion of token creation/destruction; the definition of token count that derives the mutual exclusion theorem; and the constructing procedure of a token-list that faithfully records the arrival time sequence of lock requests so that FIFO ordering can be enforced.
Two fast mutual exclusion algorithms using read-modifywrite and atomic read/write registers are p... more Two fast mutual exclusion algorithms using read-modifywrite and atomic read/write registers are presented. The first one uses both compare&swap and fetch&store; the second uses only fetch&store. Fetch&store are more commonly available than compare&swap. It is impossible to obtain better algorithms if "time" is measured by counting remote memory references. We were able to maintain the same level of performance with or without the support of compare&swap. However, fairness is degraded from 1-bounded bypass to lockout freedom without the support.
Journal of Parallel and Distributed Computing, May 1, 1997
This paper establishes the necessary and sufficient condition for a correct clock resetting such ... more This paper establishes the necessary and sufficient condition for a correct clock resetting such that the functionality of vector clocks can be preserved. A clock reset protocol is presented with its applicability and limitation discussed. Our result indicates that for some applications, the potential of clock overflow can be completely prevented by carefully choosing the condition for initiating the clock reset protocol.
International Journal of Software Engineering and Knowledge Engineering, Dec 1, 1995
Concurrent programs are more difficult to test than sequential programs because of non-determinis... more Concurrent programs are more difficult to test than sequential programs because of non-deterministic behavior. An execution of a concurrent program non-deterministically exercises a sequence of synchronization events called a synchronization sequence (or SYN-sequence). Non-deterministic testing of a concurrent program P is to execute P with a given input many times in order to exercise distinct SYN-sequences. In this paper, we present a new testing approach called reachability testing. If every execution of P with input X terminates, reachability testing of P with input X derives and executes all possible SYN-sequences of P with input X. We show how to perform reachability testing of concurrent programs using read and write operations. Also, we present results of empirical studies comparing reachability and non-deterministic testing. Our results indicate that reachability testing has advantages over non-deterministic testing.
In distributed shared memory multiprocessors, remote memory accesses generate processor-tomemory ... more In distributed shared memory multiprocessors, remote memory accesses generate processor-tomemory traffic which may result in a bottleneck. It is therefore important to design algorithms that minimize the number of remote memory accesses. We establish a lower bound of 3 on remote access time complexity for mutual exclusion algorithms in a model where processes communicate by means of a general read-modify-write primitive. Since a general read-modify-write primitive is a generalization of all atomic primitives that access at most one shared variable, our lower bound holds for any set of such primitives. Furthermore, this lower bound is tight because it matches the upper bound of Huang's algorithm proposed in 1999.
Busy waiting is common in shared memory mutual exclusion algorithms. To reduce memory contention ... more Busy waiting is common in shared memory mutual exclusion algorithms. To reduce memory contention incurred by busy waiting, we follow the concept of local spin made popular by Mellor-Crummey and Scott and propose a generic approach for adding local spin to mutual exclusion algorithms of the atomic read/write model. Taking Eisenburg-McGuire algorithm as an example, two local spin versions were obtained. The first is an easy product of the generic approach. The second, with better inter-process communication made possible by an in-depth understanding of the algorithm, significantly reduces the number of remote memory accesses.
Execution efficiency of logic programs can be improved in two major directions: parallel processi... more Execution efficiency of logic programs can be improved in two major directions: parallel processing for more computation power and control guidance for less non-determinism. Parallel execution of a logic program represented in connection graph has to be guarded against the problems of logical inconsistency. Enforcing Bernstein conditions can prevent such problems but results in an unacceptable reduction of parallelism. A subcycle-level parallel procedure with step-wise purity deletions is designed to remedy such problems. The concurrent step-wise purity deletion has been shown to preserve much of the deletion power of the sequential purity deletion. Recursion is what makes a logic program non-trivial. Fact propagation is proposed to reduce the run-time recursive interation by a compile-time analysis of the recursive loops. Herbrand expansion tree provide a concise organization for the increasingly large number of unit clauses during the propagation. Symbolic execution through the lo...
Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003)
Two fast mutual exclusion algorithms using read-modifywrite and atomic read/write registers are p... more Two fast mutual exclusion algorithms using read-modifywrite and atomic read/write registers are presented. The first one uses both compare&swap and fetch&store; the second uses only fetch&store. Fetch&store are more commonly available than compare&swap. It is impossible to obtain better algorithms if "time" is measured by counting remote memory references. We were able to maintain the same level of performance with or without the support of compare&swap. However, fairness is degraded from 1-bounded bypass to lockout freedom without the support.
Proceedings Seventh International Conference on Parallel and Distributed Systems (Cat. No.PR00568)
Many nonblocking algorithms have been proposed for shared queues. Previous studies indicate that ... more Many nonblocking algorithms have been proposed for shared queues. Previous studies indicate that link-based algorithms perform best. However, these algorithms have a memory management problem: a dequeued node can not be freed or reused without proper handling. The problem is usually overlooked; one just assumes the existence of a lower level mechanism, which takes care of all the details of handling the problem. Employing such a mechanism incurs significant overheads, and consequently the link-based queues may not perform as well as claimed. A new nonblocking queue algorithm based on a finite array is proposed in this paper. Comparing with the link-based algorithms, the new algorithm provides the same degree of concurrency without being subject to the memory problem, hence suggests a good performance.
This paper presents a scalable scheme for ensuring causal ordering of messages passing among proc... more This paper presents a scalable scheme for ensuring causal ordering of messages passing among processes in large-scale distributed systems. Previously proposed approaches, categorized as centralized or fully distributed, either place the entire processing loads on a single process or incur quadratic message overheads in the number of participating processes. These solutions perform limitedly in large-scale systems. Our scheme organizes the entire system as hierarchical clusters in which any of the previously proposed approaches can be employed. Message causality is maintained by enforcing the rules by which messages are propagated from origins to destinations. This approach incurs much less processing load on hot-spot sites than the centralized approach, or, alternatively, requires a message space overhead much less than that in the fully distributed approach. We shall show that, by setting cluster size appropriately, the message space overhead can be only a linear or even a logarithmic function of the number of processes involved. Our approach is suitable for large non-proprietary networks.
Causal message ordering in the context of group communication ensures that all the message receiv... more Causal message ordering in the context of group communication ensures that all the message receivers observe consistent ordering of events affecting a group as a whole. This paper presents a scalable causal multicast protocol for mobile distributed computing systems. In our protocol, only a part of the mobility agents in the system is involved in group computations and the resulting size of control information in messages can be kept small. Our protocol can outperform qualitatively the counterparts in terms of communication overhead and handoff complexity. An analytical model is also developed to evaluate our proposal. The performance results show that the proposed protocol is promising.
11th International Conference on Parallel and Distributed Systems (ICPADS'05)
For shared memory systems with time and resource constraints such as embedded real-time systems, ... more For shared memory systems with time and resource constraints such as embedded real-time systems, mutual exclusion mechanism that is both fair and space-efficient can be very useful. In this paper, we present a boundedbypass algorithm using only two shared variables, regardless of the number of contending processes, by operation f etch&store as well as atomic read/write. To achieve the same level of fairness, we show that, by the same set of operations, two shared variables are necessary, and therefore our algorithm is space-optimal.
Proceedings 1998 International Conference on Parallel and Distributed Systems (Cat. No.98TB100250)
Three fast mutual exclusion algorithms using read-modify-write and atomic read/write registers ar... more Three fast mutual exclusion algorithms using read-modify-write and atomic read/write registers are presented in a sequence, with an improvement from one to the next. The last algorithm is shown to be optimal in minimizing the number of remote memory accesses required in a resource busy period. Remote memory access is the key factor of memory access bottleneck in large shared-memory multiprocessors. The algorithm is particularly suitable in such systems for applications with small critical sections and frequent resource requests.
Proceedings of 1994 International Conference on Parallel and Distributed Systems
A new lock synchronization algorithm, proposed independently by Craig and the authors, not only e... more A new lock synchronization algorithm, proposed independently by Craig and the authors, not only eliminates memory contention caused by process spinning but also preserves first in first out property. A previous result, the MCS lock algorithm, requires both compare and swap and fetch and store instructions, or the FIFO property is lost and hence starvation may occur. The new one requires only fetch and store. We provide an assertional proof for the new algorithm. Most of behavioral proofs of concurrent programs are error-prone since it is difficult and tedious to take all possibilities of interleaving among the processes into consideration. An assertional proof replaces a large number of possibilities of interleaving by a small number of invariants. New techniques in this proof are: an assertional characterization of token bit accessibility; the definition of effective assignments that brings about the notion of token creation/destruction; the definition of token count that derives the mutual exclusion theorem; and the constructing procedure of a token-list that faithfully records the arrival time sequence of lock requests so that FIFO ordering can be enforced.
International Conference on Information Networking, 1999
Causal multicast is required for several distributed applications. In a mobile computing environm... more Causal multicast is required for several distributed applications. In a mobile computing environment, it is especially important for applications that involve human interactions from several locations, for example, teleconferencing. In this paper, we present a causal multicast algorithm in which the message overhead is independent of the number of mobile hosts, and which can handle connections/disconnections easily. It also handles
IEEE Transactions on Parallel and Distributed Systems - TPDS, 1998
Journal of Parallel and Distributed Computing, 2006
In distributed shared memory multiprocessors, remote memory references generate processor-to-memo... more In distributed shared memory multiprocessors, remote memory references generate processor-to-memory traffic, which may result in a bottleneck. It is therefore important to design algorithms that minimize the number of remote memory references. We establish a lower bound of three on remote reference time complexity for mutual exclusion algorithms in a model where processes communicate by means of a general read-modify-write primitive that accesses at most one shared variable in one instruction. Since the general read-modify-write primitive is a generalization of a variety of atomic primitives that have been implemented in multiprocessor systems, our lower bound holds for all mutual exclusion algorithms that use such primitives. Furthermore, this lower bound is shown to be tight by presenting an algorithm with the matching upper bound.
Journal of Parallel and Distributed Computing, 1997
This paper establishes the necessary and sufficient condition for a correct clock resetting such ... more This paper establishes the necessary and sufficient condition for a correct clock resetting such that the functionality of vector clocks can be preserved. A clock reset protocol is presented with its applicability and limitation discussed. Our result indicates that for some applications, the potential of clock overflow can be completely prevented by carefully choosing the condition for initiating the clock reset protocol.
IEEE Transactions on Parallel and Distributed Systems, 1998