Deadlock Detection Scheduling for Distributed Processes in the Presence of System Failures (original) (raw)

Deadlock Detection Scheduling for Distributed Processes in the Presence of System Failures

Akikazu IZUMI †{ }^{\dagger}, Tadashi DOHI‡\mathrm{DOHI}^{\ddagger} and Naoto KAIO ‡{ }^{\ddagger}
†{ }^{\dagger} Department of Information Engineering, Graduate School of Engineering
Hiroshima University, 1-4-1 Kagamiyama, Higashi-Hiroshima, 739-8527 Japan
Email: dohi@rel.hiroshima-u.ac.jp
‡{ }^{\ddagger} Department of Economic Informatics, Faculty of Economic Sciences
Hiroshima Shudo University, 1-1-1 Ohzukahigashi, Asaminami-ku, Hiroshima, 739-3195, Japan
kaio@shudo-u.ac.jp

Abstract

The occurrence of deadlocks should be controlled effectively by their detection and resolution, but may sometimes lead to a serious system failure. This fact implies that deadlock detection scheduling should be designed from the view points of not only the performance trade-off between overall message usage and deadlock persistence time but also the prevention of the system failure. In this paper, we reformulate the Ling et al.'s deadlock detection scheduling problem (2006) in the presence of system failures, and derive the optimal deadlock detection time minimizing the long-run average cost per unit time. By introducing the message complexities of the deadlock detection and resolution algorithms being used, we investigate the asymptotically optimal frequency of deadlock detection scheduling in terms of the number of distributed processes through the wellknown Landau notation.

I. INTRODUCTION

Deadlock can occur whenever two or more processes are competing for limited resources, where the processes are allowed to acquire and hold a resource, and prevent others from using the resource while they wait for other resources [5]. Especially, the distributed deadlock problem arises from resource contention introduced by concurrent processes in distributed computing [8],[9],[16]. In general, the problem of deadlocks can be handled from four points of view; prevention, avoidance, detection and resolution, which are quite similar to the fault-tolerant computing. Unfortunately, the complete deadlock prevention and deadlock avoidance are often infeasible as well, and have several disadvantages such that resources must be acquired because they might be used, not because they will be used, and that the aborted process may not have been actually involved in a deadlock [6],[13]. On the other hand, it is known that the deadlock detection/resolution strategy is acceptable as an optimistic and feasible solution to the deadlock problem [1],[7],[10],[12],[14],[15],[18].

Deadlock detection is used to find and resolve actual deadlocks, but still requires a costly operation on a wait-for-graph (WFG), whose nodes and edges represent processes and the blockages or dependencies, respectively. One way to detect deadlocks in distributed systems is for each site to construct a local WFG on the part it knows about. A site that suspects deadlock initiates a global snapshot protocol that constructs a consistent global state of the distributed
system, which corresponds to a deadlock if the union of the local WFGs has a cycle. Chandy et al. [2] propose both use of local WFGs to detect local deadlocks and probe to determine the existence of global deadlocks. The main idea of deadlock detection/resolution strategy is that it does not preclude the possibility of deadlock occurring but leaves the burden of minimizing the adverse impact of deadlock to deadlock detection/resolution mechanism. Hence, the presence of deadlocks is detected by a periodic initiation of a deadlock detection algorithm and then resolved by a deadlock resolution algorithm [4],[17].

There is a substantial tradeoff between the cost of deadlock detection and that of deadlock resolution [9],[16], which will significantly affect the overall performance of deadlock handling. Excessive initiation of deadlock detection increases overall message usage, resulting in degraded system performance in the absence of deadlocks, while insufficient initiation of deadlock detection increases the deadlock persistence time, resulting in an increased deadlock resolution cost in the presence of deadlocks. Chen and Ling [3] and Ling et al. [11] take account of the above tradeoff, and consider a stochastic model to derive an optimal deadlock detection frequency that minimizes the long-run average cost per unit time, which is determined by the message complexities of the best known deadlock detection and resolution algorithms, as well as the rate of deadlock formation, denoted by λ\lambda. More specifically, they show that an asymptotically optimal frequency of deadlock detection scheduling that minimizes the overall message overhead is given by O((λn)1/3)O\left((\lambda n)^{1 / 3}\right), where nn is the number of distributed processes.

On the other hand, in practice there are a number of system failures that result from a series of deadlocks in real operating systems and relational database applications. Once a deadlock occurs, one of processes must be forced to finish and restart after a constant time. Hence, if the deadlock can be detected and resolved completely, it may be considered at the first look that the occurrence of deadlocks does not lead to a system failure. However, when a series of deadlocks continuously occur, the number of distributed processes which wait for executions may increase, and as the result the number of restarts may also increase. In such a unstable state, system

failures may occur with some positive probability. The most vivid example is the system down of a database system in Tokyo Stock Exchange, Japan, on March 10, 2008, where the trading of two stocks was stopped for four hours. Since the upper limit on the number of restarts was set as 100 times in the stock trading system, the buy-sell order of these stocks had not been commanded during that period with safety lock. The main cause of this accident was of course a design bug on setting the upper limit of restarts. However, it is worth mentioning that the occurrence of a series of deadlocks certainly triggered the system failure. In other words, the deadlock detection scheduling for distributed processes should be considered in the presence of system failures by modeling a relationship between the cumulative number of deadlocks and its associated system failure.

In this paper, we reformulate the Ling et al.'s deadlock detection scheduling problem [11] in the presence of system failures, and derive the optimal deadlock detection time minimizing the long-run average cost per unit time. Under the assumption that the system failure occurs when the cumulative number of deadlocks arrives at a random level, we consider four cases in terms of deadlock detection and resolution costs. In a fashion similar to [11], and by introducing the message complexities of the deadlock detection and resolution algorithms being used in practice, as well as the rate of deadlock formation, we investigate the asymptotically optimal frequency of deadlock detection scheduling in terms of the number of distributed processes through the well-known Landau notation. Numerical illustrations are devoted to carry out the sensitivity analysis of model parameters on the optimal deadlock detection scheduling and its related long-run average cost per unit time. This paper is an extension of Chen and Ling [3] and Ling et al. [11], and focuses on the situation where more severe system down state triggered by the deadlocks may occur in distributed systems.

Before describing the work by Chen and Ling [3] and Ling et al. [11], we give some definitions on the technical terms used here:
(i) A deadlock refers to a circular-wait condition where a set of processes waits indefinitely for resource from each other.
(ii) The deadlock size is defined as the total number of Blocked Process (BP) involved in them, where BP is the process that waits indefinitely on other processes to progress, and is decomposed into deadlocked BP and transitively BP. The former belongs to a cycle in the WFG, the latter does not belong to any cycle in the WFG.
(iii) Two deadlocks are independent if they do not share any deadlocked process.
In a distributed system, let S={S1,S2,⋯ }S=\left\{S_{1}, S_{2}, \cdots\right\} be the time instants at which independent deadlocks initially occur. Define the persistence time of the ii-th deadlock by tp=t−Sit_{p}=t-S_{i} if t>Sit>S_{i} otherwise tp=0t_{p}=0, which is the time interval between the

Fig. 1. Increasing deadlock size with deadlock persistence time.
present time tt and the time at which the deadlock is initially formed. It is intuitive to assume that both the size of deadlocks (the number of deadlocks) and the message-complexity needed to resolve will increase with the persistent time. Figure 1 illustrates two snapshots in a WFG to show increasing deadlock size with deadlock persistence time. In this example, at time =1=1, there are three circularly deadlocked processes (X, Y, Z) and two transitively BPs. Next, at time =2=2, there are seven circularly deadlocked processes. Let nn and nD(⋅)n_{D}(\cdot) be the total number of processes and the size of deadlock, respectively. Since the deadlock size can be considered as a function of the persistent time, we represent it by nD(tp)n_{D}\left(t_{p}\right) which is differentiable with respect to tpt_{p}, i.e., dnD(tp)/dtp=nD′(tp)d n_{D}\left(t_{p}\right) / d t_{p}=n_{D}^{\prime}\left(t_{p}\right) exists. Similar to [11], we make the following assumptions:
(A.1) nD(0)=0n_{D}(0)=0,
(A.2) nD′(tp)>0n_{D}^{\prime}\left(t_{p}\right)>0 for tp>0t_{p}>0,
(A.3) nD(tp)≤nn_{D}\left(t_{p}\right) \leq n.

Since the bounded condition on nD(tp)n_{D}\left(t_{p}\right) is not so strong, because the number of distributed processes is sufficiently large.

When we focus on the distributed deadlock detection and resolution algorithms, it is quite important to connect these algorithms to the concept of cost. Kshemkalyani and Singhal [8] develop a distributed detection algorithm to achieve the worst-case message complexity O(2n2)O\left(2 n^{2}\right) and the time complexity O(2n)O(2 n), which have been considered as the best existing deadlock detection algorithm. On the other hand, the deadlock resolution cost is in general measured either in terms of time complexity or message complexity. Let mm be the number of processes having the priorities greater than deadlocked processes. Up to the present, the deadlock resolution algorithm by de Mendivil et al. [4] is known as the best one with the message complexity O(mnD2)O\left(m n_{D}^{2}\right), where the worst-case message complexity can be written as O(n3)O\left(n^{3}\right) because the eventual deadlock size nDn_{D} is bounded by nn, so that m=O(n)m=O(n) and nD=O(n)n_{D}=O(n). Chen and Ling [3] and Ling et al. [11] revisit the

message complexity achieved by de Mendivil et al. algorithm [4], to represent the dependency O(nnD2)=O(nnD2)O\left(n n_{D}^{2}\right)=O\left(n n_{D}^{2}\right), by O(cnnD2)O\left(c n n_{D}^{2}\right), where cc is an arbitrary constant.

Suppose that the deadlock forms in accordance with a homogeneous Poisson process with intensity parameter λ(>0)\lambda(>0) and that the deadlocks are periodically detected at time TT, where 1/T1 / T is called the frequency of deadlock detection. Without any loss of generality, it is assumed that the detected deadlocks can be resolved immediately. Let CDC_{D} and CR(t)C_{R}(t) denote the detection cost per unit deadlock and the resolution cost per unit deadlock, respectively, where CR(t)C_{R}(t) is a strictly increasing function of time tt and depends on the size of detected deadlocks. Then, the long-run average cost per unit time is given by

C0(T)=CDT+λ∫0TCR(t)dtTC_{0}(T)=\frac{C_{D}}{T}+\frac{\lambda \int_{0}^{T} C_{R}(t) d t}{T}

Chen and Ling [3] and Ling et al. [11] derive the optimal deadlock detection time T∗T^{*} which minimizes the long-run average cost per unit time in Eq.(1). Though they did not refer to the uniqueness of the optimal deadlock detection schedule, it is easy to prove the uniqueness. They also show that the periodic (equidistant) deadlock detection schedule with a fixed T∗T^{*} is always better than the randomized deadlock detection scheduling in which the detection time is given by an independent and identically distributed random variable, in terms of minimization of the long-run average cost per unit time. Chen and Ling [3] and Ling et al. [11] also derive an asymptotically optimal frequency of deadlock detection scheduling by substituting 2n22 n^{2} and cn{nD(t)}2c n\left\{n_{D}(t)\right\}^{2} into CDC_{D} and CR(t)C_{R}(t), respectively, under the assumption that no system failure occurs.

III. EXTENDED MODELS

A. A Fundamental Failure Model

Suppose that the system failure occurs when the total number of deadlocks reaches to a level k(=1,2,…)k(=1,2, \ldots). Let τk\tau_{k} be the time to kk-th deadlock. Then, from the well-known result on the Poisson process, we have

P(N(T)≥k)=P(τk≤T)=∫0Tλkxk−1e−λx(k−1)!dx=∑i=k∞(λT)ie−λti!\begin{aligned} P(N(T) \geq k) & =P\left(\tau_{k} \leq T\right)=\int_{0}^{T} \frac{\lambda^{k} x^{k-1} e^{-\lambda x}}{(k-1)!} d x \\ & =\sum_{i=k}^{\infty} \frac{(\lambda T)^{i} e^{-\lambda t}}{i!} \end{aligned}

The long-run average cost per unit time is formulated as

C1(T∣k≥1)=CD+E[∫0τkCR(τk−t)λdt;T≥τk]E[min⁡(τk,T)]+E[∫0TCR(T−t)λdt;T<τk]E[min⁡(τk,T)]\begin{aligned} C_{1}(T \mid k \geq 1)= & \frac{C_{D}+\mathrm{E}\left[\int_{0}^{\tau_{k}} C_{R}\left(\tau_{k}-t\right) \lambda d t ; T \geq \tau_{k}\right]}{\mathrm{E}\left[\min \left(\tau_{k}, T\right)\right]} \\ & +\frac{\mathrm{E}\left[\int_{0}^{T} C_{R}(T-t) \lambda d t ; T<\tau_{k}\right]}{\mathrm{E}\left[\min \left(\tau_{k}, T\right)\right]} \end{aligned}

provided that the failure level kk is known in advance. Since the failure level on the cumulative number of deadlocks cannot be specified in general, however, it can be regarded as an integer-valued random variable. If the failure level kk possesses the so-called memoryless property, it is appropriate to assume
that kk obeys the geometric distribution with failure probability p(0<p<1)p(0<p<1). Then, we obtain the non-conditional long-run average cost per unit time:

C1(T)=λpCD+λ3p2∫0Te−λpx[∫0xCR(y)dy]dx+λ2pe−λpT∫0TCR(y)dy1−e−λpT (4) \begin{aligned} C_{1}(T)= & \frac{\lambda p C_{D}+\lambda^{3} p^{2} \int_{0}^{T} e^{-\lambda p x}\left[\int_{0}^{x} C_{R}(y) d y\right] d x}{+\frac{\lambda^{2} p e^{-\lambda p T} \int_{0}^{T} C_{R}(y) d y}{1-e^{-\lambda p T}}} \\ & \text { (4) } \end{aligned}

Define the function:

g1(T)=λ2pCR(T)B1(T)−λpA1(T)g_{1}(T)=\lambda^{2} p C_{R}(T) B_{1}(T)-\lambda p A_{1}(T)

where

g1(∞)=λ2pCR(∞)−λp{λpCD+λ3p2∫0∞e−λpx[∫0xCR(y)dy]dx}\begin{aligned} g_{1}(\infty)= & \lambda^{2} p C_{R}(\infty)-\lambda p\left\{\lambda p C_{D}\right. \\ & \left.+\lambda^{3} p^{2} \int_{0}^{\infty} e^{-\lambda p x}\left[\int_{0}^{x} C_{R}(y) d y\right] d x\right\} \end{aligned}

and C1(T)=A1(T)/B1(T)C_{1}(T)=A_{1}(T) / B_{1}(T) in Eq. (4). Our concern is to derive the optimal periodic deadlock detection time T∗T^{*} minimizing the long-run average cost per unit time C1(T)C_{1}(T).
Lemma 1: lim⁡p→0C1(T)=C0(T)\lim _{p \rightarrow 0} C_{1}(T)=C_{0}(T).
Theorem 1: If g1(∞)>0g_{1}(\infty)>0, there exists a unique optimal deadlock detection time T∗(0<T∗<∞)T^{*}\left(0<T^{*}<\infty\right) satisfying the non-linear equation g1(T∗)=0g_{1}\left(T^{*}\right)=0, and the corresponding longrun average cost per unit time is given by

C1(T∗)=λCR(T∗)C_{1}\left(T^{*}\right)=\lambda C_{R}\left(T^{*}\right)

Otherwise, i.e., g1(∞)≤0g_{1}(\infty) \leq 0, we have T∗→∞T^{*} \rightarrow \infty, so that it is always optimal to detect the deadlock at the system failure time.

The above results hold independently from deadlock detection and resolution costs. We call this Case 1 in this paper. In the remaining part of this section, we consider three special cases on cost parameters (Case 2, Case 3 and Case 4).

B. Special Case with Different Resolution Costs

In the fundamental failure model it is assumed that resolution costs are exactly same in respective cases before and after the system failure occurs. However, this is a rather simplified assumption in terms of the reality. Let CR1(T)C_{R 1}(T) and CR2(T)C_{R 2}(T) denote the resolution costs before and after the occurrence of a system failure, respectively (Case 2). Then the long-run average cost per unit time is given by

C2(T)=λpCD+λ3p2∫0Te−λpx[∫0xCR1(y)dy]dx1−e−λpT+λ2pe−λpT∫0TCR2(y)dyA2(T)/B2(T)⏞1−e−λpT=A2(T)/B2(T)⏞1−e−λpT\begin{aligned} C_{2}(T)= & \frac{\lambda p C_{D}+\lambda^{3} p^{2} \int_{0}^{T} e^{-\lambda p x}\left[\int_{0}^{x} C_{R 1}(y) d y\right] d x}{1-e^{-\lambda p T}} \\ & +\frac{\lambda^{2} p e^{-\lambda p T} \int_{0}^{T} C_{R 2}(y) d y}{A_{2}(T) / \overbrace{B_{2}(T)}^{1-e^{-\lambda p T}}} \\ = & A_{2}(T) / \overbrace{B_{2}(T)}^{1-e^{-\lambda p T}} \end{aligned}

Differentiating C2(T)C_{2}(T) with respect to TT and letting it equal to 0 yields g2(T)=0g_{2}(T)=0, where

g2(T)=(λ3p2[∫0TCR1(y)dy−∫0TCR2(y)dy]g_{2}(T)=\left(\lambda^{3} p^{2}\left[\int_{0}^{T} C_{R 1}(y) d y-\int_{0}^{T} C_{R 2}(y) d y\right]\right.

+λ2pCR2(T))B2(T)−λpA2(T)g2(∞)=(λ3p2[∫0∞CR1(y)dy−∫0∞CR2(y)dy]+λ2pCR2(∞))−λp{λpCD+λ3p2∫0∞e−λpx[∫0TCR1(y)dy]dx}\begin{aligned} & \left.+\lambda^{2} p C_{R 2}(T)\right) B_{2}(T)-\lambda p A_{2}(T) \\ g_{2}(\infty)= & \left(\lambda^{3} p^{2}\left[\int_{0}^{\infty} C_{R 1}(y) d y-\int_{0}^{\infty} C_{R 2}(y) d y\right]\right. \\ & \left.+\lambda^{2} p C_{R 2}(\infty)\right)-\lambda p\left\{\lambda p C_{D}\right. \\ & \left.+\lambda^{3} p^{2} \int_{0}^{\infty} e^{-\lambda p x}\left[\int_{0}^{T} C_{R 1}(y) d y\right] d x\right\} \end{aligned}

and

dg2(T)dT=λ3p2{CR1(T)−CR2(T)}B2(T)+λ2pCR2′(T)B2(T)\begin{aligned} \frac{d g_{2}(T)}{d T}= & \lambda^{3} p^{2}\left\{C_{R 1}(T)-C_{R 2}(T)\right\} B_{2}(T) \\ & +\lambda^{2} p C_{R 2}^{\prime}(T) B_{2}(T) \end{aligned}

if the function CR2(T)C_{R 2}(T) is differentiable and CR2′(T)=C_{R 2}^{\prime}(T)= dCR2(T)/dTd C_{R 2}(T) / d T.
Theorem 2: Suppose that CR1(T)≥CR2(T)C_{R 1}(T) \geq C_{R 2}(T) for an arbitrary T(>0)T(>0). If g2(∞)>0g_{2}(\infty)>0, there exists a unique optimal deadlock detection time T∗(0<T∗<∞)T^{*}\left(0<T^{*}<\infty\right) satisfying the non-linear equation g2(T∗)=0g_{2}\left(T^{*}\right)=0, and the corresponding long-run average cost per unit time is given by

C2(T∗)=λ2p(∫0T∗CR1(y)dy−∫0T∗CR2(y)dy)+λCR2(T∗)\begin{aligned} C_{2}\left(T^{*}\right)= & \lambda^{2} p\left(\int_{0}^{T^{*}} C_{R 1}(y) d y-\int_{0}^{T^{*}} C_{R 2}(y) d y\right) \\ & +\lambda C_{R 2}\left(T^{*}\right) \end{aligned}

Otherwise, T∗→∞T^{*} \rightarrow \infty.

C. Special Case with Different Detection Costs

Next we consider different deadlock detection costs CD1C_{D 1} and CD2C_{D 2}, which correspond to the respective probabilistic events τk≤T\tau_{k} \leq T and T<τkT<\tau_{k} (Case 3). In a fashion similar to the previous argument, we have the long-run average cost per unit time:

C3(T)=λp{CD2+(CD1−CD2)e−λpT}λ3p2∫0Te−λpx[∫0xCR(y)dy]dx+1Te−λpT+λ2pe−λpT∫0TCR(y)dy=A3(T)/B3(T)\begin{aligned} C_{3}(T)= & \frac{\lambda p\left\{C_{D 2}+\left(C_{D 1}-C_{D 2}\right) e^{-\lambda p T}\right\}}{\lambda^{3} p^{2} \int_{0}^{T} e^{-\lambda p x}\left[\int_{0}^{x} C_{R}(y) d y\right] d x} \\ & +\frac{1}{T} e^{-\lambda p T} \\ & +\lambda^{2} p e^{-\lambda p T} \int_{0}^{T} C_{R}(y) d y \\ = & A_{3}(T) / B_{3}(T) \end{aligned}

For the definitions:

g3(T)={λp(CD2−CD1)+λ2pCR(T)}B3(T)−λpA3(T)g3(∞)={λp(CD2−CD1)+λ2pCR(∞)}−λp{λpCD2+λ3p2∫0∞e−λpx×[∫0xCR(y)dy]dx}\begin{aligned} g_{3}(T)= & \left\{\lambda p\left(C_{D 2}-C_{D 1}\right)+\lambda^{2} p C_{R}(T)\right\} B_{3}(T) \\ & -\lambda p A_{3}(T) \\ g_{3}(\infty)= & \left\{\lambda p\left(C_{D 2}-C_{D 1}\right)+\lambda^{2} p C_{R}(\infty)\right\} \\ & -\lambda p\left\{\lambda p C_{D 2}+\lambda^{3} p^{2} \int_{0}^{\infty} e^{-\lambda p x}\right. \\ & \left.\times\left[\int_{0}^{x} C_{R}(y) d y\right] d x\right\} \end{aligned}

it is straightforward to obtain dg3(T)/dT=d g_{3}(T) / d T= λ2pCR′(T)B3(T)>0\lambda^{2} p C_{R}^{\prime}(T) B_{3}(T)>0, where CR′(T)=dCR(T)/dTC_{R}^{\prime}(T)=d C_{R}(T) / d T.
Theorem 3: If g3(∞)>0g_{3}(\infty)>0, there exists a unique optimal deadlock detection time T∗(0<T∗<∞)T^{*}\left(0<T^{*}<\infty\right) satisfying the non-linear equation g3(T∗)=0g_{3}\left(T^{*}\right)=0, and the corresponding longrun average cost per unit time is given by

C3(T∗)=λCR(T∗)+CD2−CD1C_{3}\left(T^{*}\right)=\lambda C_{R}\left(T^{*}\right)+C_{D 2}-C_{D 1}

Otherwise, T∗→∞T^{*} \rightarrow \infty.

D. Special Case with Different Resolution/Detection Costs

Finally we combine two previous cases with four different cost parameters, CR1(T),CR2(T),CD1C_{R 1}(T), C_{R 2}(T), C_{D 1} and CD2C_{D 2} (Case 4). From a simple calculation, we have

C4(T)=λp{CD2+(CD1−CD2)e−λpT}1−e−λpT+λ3p2∫0Te−λpx[∫0xCR1(y)dy]dx1−e−λpT+λ2pe−λpT∫0TCR2(y)dy1−e−λpT=A4(T)/B4(T)\begin{aligned} C_{4}(T)= & \frac{\lambda p\left\{C_{D 2}+\left(C_{D 1}-C_{D 2}\right) e^{-\lambda p T}\right\}}{1-e^{-\lambda p T}} \\ & +\frac{\lambda^{3} p^{2} \int_{0}^{T} e^{-\lambda p x}\left[\int_{0}^{x} C_{R 1}(y) d y\right] d x}{1-e^{-\lambda p T}} \\ & +\frac{\lambda^{2} p e^{-\lambda p T} \int_{0}^{T} C_{R 2}(y) d y}{1-e^{-\lambda p T}}=A_{4}(T) / B_{4}(T) \end{aligned}

Define

g4(T)=(λp(CD2−CD1)+λ3p2[∫0TCR1(y)dy−∫0TCR2(y)dy]+λ2pCR2(T))B4(T)−λpA4(T)g4(∞)=(λp(CD2−CD1)+λ3p2[∫0∞CR1(y)dy−∫0∞CR2(y)dy]+λ2pCR2(∞))−λp{λpCD2+λ3p2∫0∞e−λpx×[∫0xCR1(y)dy]dx}\begin{aligned} g_{4}(T)= & \left(\lambda p\left(C_{D 2}-C_{D 1}\right)+\lambda^{3} p^{2}\left[\int_{0}^{T} C_{R 1}(y) d y\right.\right. \\ & \left.-\int_{0}^{T} C_{R 2}(y) d y\right]+\lambda^{2} p C_{R 2}(T)) B_{4}(T) \\ & -\lambda p A_{4}(T) \\ g_{4}(\infty)= & \left(\lambda p\left(C_{D 2}-C_{D 1}\right)+\lambda^{3} p^{2}\left[\int_{0}^{\infty} C_{R 1}(y) d y\right.\right. \\ & -\int_{0}^{\infty} C_{R 2}(y) d y]+\lambda^{2} p C_{R 2}(\infty)) \\ & -\lambda p\left\{\lambda p C_{D 2}+\lambda^{3} p^{2} \int_{0}^{\infty} e^{-\lambda p x}\right. \\ & \left.\times\left[\int_{0}^{x} C_{R 1}(y) d y\right] d x\right\} \end{aligned}

Theorem 4: Suppose that CR1(T)≥CR2(T)C_{R 1}(T) \geq C_{R 2}(T) for an arbitrary T(>0)T(>0). If g4(∞)>0g_{4}(\infty)>0, there exists a unique optimal deadlock detection time T∗(0<T∗<∞)T^{*}\left(0<T^{*}<\infty\right) satisfying the non-linear equation g4(T∗)=0g_{4}\left(T^{*}\right)=0, and the corresponding long-run average message cost per unit time is given by

C4(T∗)=CD2−CD1+λ2p[∫0TCR1(y)dy−∫0TCR2(y)dy]+λCR2(T)\begin{aligned} C_{4}\left(T^{*}\right)= & C_{D 2}-C_{D 1}+\lambda^{2} p\left[\int_{0}^{T} C_{R 1}(y) d y\right. \\ & \left.-\int_{0}^{T} C_{R 2}(y) d y\right]+\lambda C_{R 2}(T) \end{aligned}

Otherwise, T∗→∞T^{*} \rightarrow \infty.
In this section we have derived the optimal deadlock detection times which minimize the long-run average costs per unit time in four cases. Based on the analogy to the result by Ling et al. [11], it is not so difficult even in the above four cases to show that the periodic deadlock detection is not worse than the aperiodic (variable) one. Once the cost parameters, CR1(T),CR2(T),CD1C_{R 1}(T), C_{R 2}(T), C_{D 1} and CD2C_{D 2}, are given with the arrival rate of deadlocks λ\lambda and the system failure probability pp, the optimal periodic deadlock detection schedule can be calculated. However, it should be worth mentioning that the above cost parameters strongly depend on the deadlock detection/resolution algorithms employed and are regarded as functions of the number of processes. In the following section, we introduce the concept of message-complexity by connecting cost parameters with the deadlock detection and resolution

TABLE I
ASYMPTOTIC NOTATION.

symbol	mean	definition
f(n)∈O(g(n))f(n) \in O(g(n))	asymptotically bounded from upper	lim⁡n→∞sup⁡g(n)f(n)g(n)<∞\lim _{n \rightarrow \infty} \sup _{g(n)} \frac{f(n)}{g(n)}<\infty
f(n)∈Ω(g(n))f(n) \in \Omega(g(n))	asymptotically bounded from lower	lim⁡n→∞inf⁡g(n)f(n)g(n)>0\lim _{n \rightarrow \infty} \inf _{g(n)} \frac{f(n)}{g(n)}>0
f(n)∈Θ(g(n))f(n) \in \Theta(g(n))	asymptotically bounded from upper and lower	0<lim⁡n→∞inf⁡g(n)f(n)g(n)<lim⁡n→∞sup⁡g(n)sup⁡nf(n)g(n)<∞0<\lim _{n \rightarrow \infty} \inf _{g(n)} \frac{f(n)}{g(n)}<\lim _{n \rightarrow \infty} \sup _{g(n)} \sup _{n} \frac{f(n)}{g(n)}<\infty

algorithms from the view point of computational complexity, and discuss the asymptotically optimal frequency of deadlock detection schedule minimizing the long-run average messagecomplexity per unit time in the presence of system failures.

IV. Asymptotic Analysis of Message Complexity

A. Definition

The asymptotic notation of the optimal deadlock schedule minimizing the long-run average message-complexity per unit time gives an asymptotic behavior in terms of the number of processes on a distributed system. the Landau notation is developed by Edmund Georg Hermann Landau, and is a useful tool to measure the computation order. Table I summarizes the symbols used in the asymptotic notation. Using this, we present an asymptotic relationship between the optimal deadlock detection time and the number of distributed processes.

B. Analysis

Following Ling et al. [11], we investigate the asymptotic property of the optimal deadlock detection schedule minimizing the long-run average message-complexity per unit time. First, consider the fundamental failure model in Case 1. Since the optimal deadlock detection time T∗T^{*} must satisfy the first order condition of optimality, say, g1(T∗)=0g_{1}\left(T^{*}\right)=0, it leads to

λCR(T∗)(1−e−λpT∗)=λpCD+λ3p2∫0T∗e−λpx[∫0xCR(y)dy]dx+λ2pe−λpT∗∫0T∗CR(y)dy\begin{aligned} & \lambda C_{R}\left(T^{*}\right)\left(1-e^{-\lambda p T^{*}}\right) \\ & \quad=\lambda p C_{D}+\lambda^{3} p^{2} \int_{0}^{T^{*}} e^{-\lambda p x}\left[\int_{0}^{x} C_{R}(y) d y\right] d x \\ & \quad+\lambda^{2} p e^{-\lambda p T^{*}} \int_{0}^{T^{*}} C_{R}(y) d y \end{aligned}

Substituting the message-complexities, CD=2n2C_{D}=2 n^{2} and CR(t)=cnn⁡D2(t)C_{R}(t)=\operatorname{cnn}_{D}^{2}(t), of the existing best deadlock detection algorithm by Kshemkalyani and Singhal [8] and the best deadlock resolution algorithm by de Mendivil et al. [4] respectively into Eq.(21), and applying the Maclaurin’s expansion to the
exponential function exp⁡{−λpT∗}\exp \left\{-\lambda p T^{*}\right\}, we derive

cn3(∑i=1∞∑j=1∞cicj(T∗)i+j−∑i=1∞∑j=1∞∑k=0∞(−λp)k(T∗)i+j+kk!)=cλ2n3p2∫0T∗(∑i=1∞∑j=1∞∑k=0∞cicj(−λp)kxi+j+k+1k!(i+j+1))dx+cλn3p(∑i=1∞∑j=1∞∑k=0∞cicj(−λp)k(T∗)i+j+k+1k!(i+j+1))+2pn2=cλ2n3p2(∑i=1∞∑j=1∞∑k=0∞cicj(−λp)k(T∗)i+j+k+2k!(i+j+1)(i+j+k+2))+cλn3p(∑i=1∞∑j=1∞∑k=0∞cicj(−λp)k(T∗)i+j+k+1k!(i+j+1))+2pn2\begin{aligned} & c n^{3}\left(\sum_{i=1}^{\infty} \sum_{j=1}^{\infty} c_{i} c_{j}\left(T^{*}\right)^{i+j}\right. \\ & \left.\quad-\sum_{i=1}^{\infty} \sum_{j=1}^{\infty} \sum_{k=0}^{\infty} \frac{\left(-\lambda p\right)^{k}\left(T^{*}\right)^{i+j+k}}{k!}\right) \\ & =c \lambda^{2} n^{3} p^{2} \int_{0}^{T^{*}}\left(\sum_{i=1}^{\infty} \sum_{j=1}^{\infty} \sum_{k=0}^{\infty} \frac{c_{i} c_{j}\left(-\lambda p\right)^{k} x^{i+j+k+1}}{k!(i+j+1)}\right) d x \\ & +c \lambda n^{3} p\left(\sum_{i=1}^{\infty} \sum_{j=1}^{\infty} \sum_{k=0}^{\infty} \frac{c_{i} c_{j}\left(-\lambda p\right)^{k}\left(T^{*}\right)^{i+j+k+1}}{k!(i+j+1)}\right) \\ & +2 p n^{2} \\ & =c \lambda^{2} n^{3} p^{2}\left(\sum_{i=1}^{\infty} \sum_{j=1}^{\infty} \sum_{k=0}^{\infty} \frac{c_{i} c_{j}\left(-\lambda p\right)^{k}\left(T^{*}\right)^{i+j+k+2}}{k!(i+j+1)(i+j+k+2)}\right) \\ & +c \lambda n^{3} p\left(\sum_{i=1}^{\infty} \sum_{j=1}^{\infty} \sum_{k=0}^{\infty} \frac{c_{i} c_{j}\left(-\lambda p\right)^{k}\left(T^{*}\right)^{i+j+k+1}}{k!(i+j+1)}\right) \\ & +2 p n^{2} \end{aligned}

for an arbitrary nD(t)=n∑i=0∞citin_{D}(t)=n \sum_{i=0}^{\infty} c_{i} t^{i} with ci={nD(0)}i/ic_{i}=\left\{n_{D}(0)\right\}^{i} / i !. Dividing the both sides of Eq.(22) by 2n3p2 n^{3} p, we have

1n=−12pc(∑i=1∞∑j=1∞∑k=1∞(−λp)k(T∗)i+j+kk!)−12cλ2p(∑i=1∞∑j=1∞∑k=0∞cicj(−λp)k(T∗)i+j+k+2k!(i+j+1)(i+j+k+2))−12cλ(∑i=1∞∑j=1∞∑k=0∞cicj(−λp)k(T∗)i+j+k+1k!(i+j+1))\begin{aligned} & \frac{1}{n}=-\frac{1}{2 p} c\left(\sum_{i=1}^{\infty} \sum_{j=1}^{\infty} \sum_{k=1}^{\infty} \frac{\left(-\lambda p\right)^{k}\left(T^{*}\right)^{i+j+k}}{k!}\right) \\ & -\frac{1}{2} c \lambda^{2} p\left(\sum_{i=1}^{\infty} \sum_{j=1}^{\infty} \sum_{k=0}^{\infty} \frac{c_{i} c_{j}\left(-\lambda p\right)^{k}\left(T^{*}\right)^{i+j+k+2}}{k!(i+j+1)(i+j+k+2)}\right) \\ & -\frac{1}{2} c \lambda\left(\sum_{i=1}^{\infty} \sum_{j=1}^{\infty} \sum_{k=0}^{\infty} \frac{c_{i} c_{j}\left(-\lambda p\right)^{k}\left(T^{*}\right)^{i+j+k+1}}{k!(i+j+1)}\right) \end{aligned}

In the first term of the right-hand side of Eq.(23), the order representation by the Landau notation is given in the following:

Θ(λ(T∗)31−T∗−λ2p(T∗)41−T∗+λ3p2(T∗)51−T∗−⋯ )\Theta\left(\frac{\lambda\left(T^{*}\right)^{3}}{1-T^{*}}-\frac{\lambda^{2} p\left(T^{*}\right)^{4}}{1-T^{*}}+\frac{\lambda^{3} p^{2}\left(T^{*}\right)^{5}}{1-T^{*}}-\cdots\right)

Similar to the above result, we have the order representations in the second and third terms of Eq.(23) as

Θ(λ2p(T∗)41−T∗−λ3p2(T∗)51−T∗+λ4p3(T∗)61−T∗−⋯ )\Theta\left(\frac{\lambda^{2} p\left(T^{*}\right)^{4}}{1-T^{*}}-\frac{\lambda^{3} p^{2}\left(T^{*}\right)^{5}}{1-T^{*}}+\frac{\lambda^{4} p^{3}\left(T^{*}\right)^{6}}{1-T^{*}}-\cdots\right)

respectively. Since the lowest order is given by (T∗)3\left(T^{*}\right)^{3}, from the total sum of an infinite series, we have

1n∈Θ(λ(T∗)3(1+λpT∗)(1−T∗))\frac{1}{n} \in \Theta\left(\frac{\lambda\left(T^{*}\right)^{3}}{\left(1+\lambda p T^{*}\right)\left(1-T^{*}\right)}\right)

Then we obtain a representation of the long-run average message-complexity per unit time with the deadlock detection and resolution algorithms being used in Case 1, as well as the rate of deadlock formation λ\lambda.

Theorem 5: For sufficiently large number of distributed processes nn and small T∗T^{*}, the following results hold.

1n∈O(λ(T∗)3)T∗∈Ω[(1λn)1/3]\begin{gathered} \frac{1}{n} \in O\left(\lambda\left(T^{*}\right)^{3}\right) \\ T^{*} \in \Omega\left[\left(\frac{1}{\lambda n}\right)^{1 / 3}\right] \end{gathered}

So, the optimal deadlock detection frequency 1/T∗1 / T^{*} can be represented asymptotically by O((λn)1/3)O\left((\lambda n)^{1 / 3}\right).

From the above result, we find that the asymptotically optimal deadlock detection frequency in the presence of system failures is same as that in another case in the absence of system failures [11]. This does not mean that the optimal deadlock detection times are exactly same from each other for a fixed number of processes nn. The resulting two optimal deadlock detection schedules with/without system failures converge asymptotically to the same value, as the number of processes increases.

Similar to the above case, we investigate asymptotic properties of the optimal deadlock detection frequency in Case 2, Case 3 and Case 4. Here we consider the case where the best deadlock detection algorithm [8] and the best deadlock resolution algorithm [4] are not always available. This may be realistic cases to distinguish between CD1C_{D 1} and CD2C_{D 2} and/or CR1(t)C_{R 1}(t) and CR2(t)C_{R 2}(t) in Case 2, Case 3 and Case 4. In what follows, we consider respective cases; CD1=2n2C_{D 1}=2 n^{2} and CD2=C_{D 2}= 2n2 n, and CR1(t)=cn{nD1(t)}2C_{R 1}(t)=c n\left\{n_{D 1}(t)\right\}^{2} and CR1(t)=cn{nD2(t)}2C_{R 1}(t)=c n\left\{n_{D 2}(t)\right\}^{2} for nD1(t)≠nD2(t)n_{D 1}(t) \neq n_{D 2}(t).

Theorem 6: In Case 2 with CD=2n2,CR1(t)=cn{nD1(t)}2C_{D}=2 n^{2}, C_{R 1}(t)=c n\left\{n_{D 1}(t)\right\}^{2} and CR2(t)=cn{nD2(t)}2C_{R 2}(t)=c n\left\{n_{D 2}(t)\right\}^{2}, it holds that

1n∈Θ(λ(T∗)21−T∗)\frac{1}{n} \in \Theta\left(\frac{\lambda\left(T^{*}\right)^{2}}{1-T^{*}}\right)

For sufficiently large nn and small T∗T^{*}, the following results hold.

1n∈O(λ(T∗)2)T∗∈Ω[(1λn)1/2]\begin{gathered} \frac{1}{n} \in O\left(\lambda\left(T^{*}\right)^{2}\right) \\ T^{*} \in \Omega\left[\left(\frac{1}{\lambda n}\right)^{1 / 2}\right] \end{gathered}

So, the optimal deadlock detection frequency 1/T∗1 / T^{*} can be represented asymptotically by O((λn)1/2)O\left((\lambda n)^{1 / 2}\right).

Theorem 7: In Case 3 with CD1=2n2,CD2=2nC_{D 1}=2 n^{2}, C_{D 2}=2 n and CR(t)=c{nD}2(t)C_{R}(t)=c\left\{n_{D}\right\}^{2}(t), it holds that

1n2∈Θ(λ(T∗)3(1+λpT∗)(1−T∗))\frac{1}{n^{2}} \in \Theta\left(\frac{\lambda\left(T^{*}\right)^{3}}{\left(1+\lambda p T^{*}\right)\left(1-T^{*}\right)}\right)

For sufficiently large nn and small T∗T^{*}, the following results hold.

1n2∈O(λ(T∗)3)T∗∈Ω[(1λn2)1/3]\begin{gathered} \frac{1}{n^{2}} \in O\left(\lambda\left(T^{*}\right)^{3}\right) \\ T^{*} \in \Omega\left[\left(\frac{1}{\lambda n^{2}}\right)^{1 / 3}\right] \end{gathered}

So, the optimal deadlock detection frequency 1/T∗1 / T^{*} can be represented asymptotically by O((λn2)1/3)O\left(\left(\lambda n^{2}\right)^{1 / 3}\right).
Theorem 8: In Case 4 with CD1=2n2,CD2=2n,CR1(t)=C_{D 1}=2 n^{2}, C_{D 2}=2 n, C_{R 1}(t)= cn{nD1(T)}2c n\left\{n_{D 1}(T)\right\}^{2} and CR2(t)=cn{nD2(T)}2C_{R 2}(t)=c n\left\{n_{D 2}(T)\right\}^{2}, it holds that

1n2∈Θ(λ(T∗)21−T∗)\frac{1}{n^{2}} \in \Theta\left(\frac{\lambda\left(T^{*}\right)^{2}}{1-T^{*}}\right)

For sufficiently large nn and small T∗T^{*}, the following results hold.

1n2∈O(λ(T∗)2)T∗∈Ω[(1λn2)1/2]\begin{gathered} \frac{1}{n^{2}} \in O\left(\lambda\left(T^{*}\right)^{2}\right) \\ T^{*} \in \Omega\left[\left(\frac{1}{\lambda n^{2}}\right)^{1 / 2}\right] \end{gathered}

So, the optimal deadlock detection frequency 1/T∗1 / T^{*} can be represented asymptotically by O((λn2)1/2)O\left(\left(\lambda n^{2}\right)^{1 / 2}\right).

V. NUMERICAL EXAMPLES

In this section we carry out the sensitivity analysis of model parameters on the optimal deadlock detection scheduling for four cases; Case 1∼1 \sim Case 4. Similar to Ling et al. [11], it is assumed in Case 1 that CD=n2C_{D}=n^{2} and CR(t)=n3(1−exp⁡(−t))C_{R}(t)=n^{3}(1-\exp (-t)). In Case 2 and Case 4, we set CR1(t)=2n3(1−exp⁡(−t))C_{R 1}(t)=2 n^{3}(1-\exp (-t)) and CR2(t)=n3(1−exp⁡(−t))C_{R 2}(t)=n^{3}(1-\exp (-t)). In Case 3 and Case 4, CD1=n2C_{D 1}=n^{2} and CD2=nC_{D 2}=n are assumed. Through numerical examples, we change the computational environment with n=500,1000,5000,10000,p=0.100,0.010,0.001,0.000n=500,1000,5000,10000, p=0.100,0.010,0.001,0.000 and λ=0.01,0.10,1.00\lambda=0.01,0.10,1.00. As the physical meaning, λ=0.01\lambda=0.01 means that the mean inter-arrival time between two successive deadlocks is given by 1/λ=1001 / \lambda=100 (sec.).

First consider Case 1. Figure 2 depicts the behavior of the long-run average message-complexity with respect to the deadlock detection time TT for varying failure probability pp with λ=1\lambda=1 and n=500n=500. It can be seen that the dependence of failure parameter pp on the optimal deadlock detection time is rather weak although its corresponding long-run average message-complexity is influenced by changing the failure probability. In Fig.3, we show the behavior of the long-run average message-complexity with respect to the deadlock detection time TT for varying the number of distributed processes with λ=1\lambda=1 and p=0.01p=0.01. The number of processes depends on the optimal deadlock detection scheduling in early phase. As the deadlock detection time becomes large, the long-run average message-complexity converges to a constant level.

Fig. 2. Behavior of the long-run average message-complexity with respect to deadlock detection time in Case 1 with varying p(λ=1p(\lambda=1 and n=500)n=500).

Fig. 3. Behavior of the long-run average message-complexity with respect to deadlock detection time in Case 1 with varying n(λ=1.00n(\lambda=1.00 and p=0.01)p=0.01).

We investigate more carefully and quantitatively the dependence of failure probability pp and number of processes nn on the optimal deadlock detection schedule T∗T^{*} with fixed arrival rate (λ=0.01)(\lambda=0.01) in Case 1. Table II presents the optimal deadlock detection time and its associated minimum longrun average message-complexity per unit time for varying the number of processes and the failure probability. As the number of processes increases, the optimal deadlock detection time decreases but the corresponding message-complexity increases. On the other hand, the failure probability pp is insensitive to the optimal deadlock detection time but is very sensitive to the message-complexity, as we have shown in Fig. 2. This is because the failure probability pp is set as a small value, where the case with p=0p=0 corresponds the existing result by Ling et al. [11]. When the number of distributed processes is 10 times larger, the optimal deadlock detection time T∗T^{*} becomes a quarter of the largest one, and the frequency of the deadlock gradually increases. But the corresponding long-run average message-complexity is around 350 times at maximum, the number of distributed processes strongly depends on both the deadlock detection and resolution.

Table III presents the dependence of failure probability pp and number of processes nn on the optimal deadlock detection schedule T∗T^{*} with fixed arrival rate (λ=0.01)(\lambda=0.01) in Case 2. In spite of the difference on deadlock resolution costs for a fixed deadlock detection cost, the resulting optimal deadlock detection time is insensitive to the varying pp and takes almost similar value to that in Table II. This tendency is also observed in the result of the minimum long-run average

TABLE II
DEPENDENCE OF FAILURE PROBABILITY pp AND NUMBER OF PROCESSES nn ON THE OPTIMAL DEADLOCK DETECTION SCHEDULE T∗T^{*} WITH FIXED ARRIVAL RATE (λ=0.01)(\lambda=0.01) IN CASE 1.
(a) Deadlock detection time.

nn	p=0.100p=0.100	p=0.010p=0.010	p=0.001p=0.001	p=0.000p=0.000
500	0.824530	0.824403	0.824390	0.824388
1000	0.531866	0.531839	0.531817	0.531812
5000	0.214707	0.214703	0.214700	0.214699
10000	0.148558	0.148556	0.148556	0.148554

(b) Long-run average message-complexity per unit time.

nn	p=0.100p=0.100	p=0.010p=0.010	p=0.001p=0.001	p=0.000p=0.000
500	7.01948E+05	7.01910E+05	7.01879E+05	7.01871E+05
1000	4.12492E+06	4.12476E+06	4.12464E+06	4.12460E+06
5000	2.41528E+08	2.41524E+08	2.41521E+08	2.41520E+08
10000	1.38050E+09	1.38049E+09	1.38048E+09	1.38047E+09

TABLE III
DEPENDENCE OF FAILURE PROBABILITY pp AND NUMBER OF PROCESSES nn ON THE OPTIMAL DEADLOCK DETECTION SCHEDULE T∗T^{*} WITH FIXED ARRIVAL RATE (λ=0.01)(\lambda=0.01) IN CASE 2.
(a) Deadlock detection time.

nn	p=0.100p=0.100	p=0.010p=0.010	p=0.001p=0.001	p=0.000p=0.000
500	0.824143	0.824266	0.824364	0.824388
1000	0.531733	0.531772	0.531805	0.531812
5000	0.214690	0.214694	0.214698	0.214699
10000	0.148551	0.148553	0.148555	0.148554

(b) Long-run average message-complexity per unit time.

nn	p=0.100p=0.100	p=0.010p=0.010	p=0.001p=0.001	p=0.000p=0.000
500	7.02065E+05	7.01968E-05	7.01890E+05	7.01871E+05
1000	4.12534E+06	4.12497E+06	4.12464E+06	4.12460E+06
5000	2.41537E+08	2.41528E+08	2.41522E+08	2.41520E+08
10000	1.38054E+09	1.38051E+09	1.38047E+09	1.38047E+09

message-complexity per unit time, so that the long-run average message-complexity is very sensitive to the varying pp and nn, but insensitive to the difference of resolution costs. However, if the failure probability pp increases, it can be checked that the difference of deadlock resolution costs gives an effect to the distributed deadlock detection/resolution scheduling. In Tables IV and V, we summarize the comparison results in Case 3 and Case 4, respectively. These results suggest us that the asymptotically optimal deadlock detection/resolution schedule is robust in varying both deadlock detection and resolution costs and that the number of distributed processes and the failure probability strongly depend on the message complexity. These lessons learned from the sensitivity analysis enables us to quantify the message-complexity with different message size and the failure probability, when the optimal deadlock detection/resolution scheduling is performed.

VI. CONCLUSIONS

In this paper, we have reformulated the Ling et al.'s deadlock detection scheduling problem [11] in the presence of system failures, and derived the optimal deadlock detection time minimizing the long-run average cost per unit time. By in-

TABLE IV
DEPENDENCE OF FAILURE PROBABILITY pp AND NUMBER OF PROCESSES nn ON THE OPTIMAL DEADLOCK DETECTION SCHEDULE T∗T^{*} WITH FIXED ARRIVAL RATE (λ=0.01)(\lambda=0.01) IN CASE 3.
(a) Deadlock detection time.

nn	p=0.100p=0.100	p=0.010p=0.010	p=0.001p=0.001	p=0.000p=0.000
500	0.824140	0.824198	0.824298	0.824388
1000	0.531732	0.531748	0.531788	0.531812
5000	0.214688	0.214690	0.214696	0.214699
10000	0.148549	0.148550	0.148552	0.148554

(b) Long-run average message complexity per unit time.

nn	p=0.100p=0.100	p=0.010p=0.010	p=0.001p=0.001	p=0.000p=0.000
500	7.02045E+057.02045 \mathrm{E}+05	7.02001E+057.02001 \mathrm{E}+05	7.01945E+057.01945 \mathrm{E}+05	7.01871E+057.01871 \mathrm{E}+05
1000	4.12545E+064.12545 \mathrm{E}+06	4.12513E+064.12513 \mathrm{E}+06	4.12478E+064.12478 \mathrm{E}+06	4.12460E+064.12460 \mathrm{E}+06
5000	2.41539E+082.41539 \mathrm{E}+08	2.41532E+082.41532 \mathrm{E}+08	2.41526E+082.41526 \mathrm{E}+08	2.41520E+082.41520 \mathrm{E}+08
10000	1.38061E+091.38061 \mathrm{E}+09	1.38054E+091.38054 \mathrm{E}+09	1.38049E+091.38049 \mathrm{E}+09	1.38047E+091.38047 \mathrm{E}+09

TABLE V
DEPENDENCE OF FAILURE PROBABILITY pp AND NUMBER OF PROCESSES nn ON THE OPTIMAL DEADLOCK DETECTION SCHEDULE T∗T^{*} WITH FIXED ARRIVAL RATE (λ=0.01)(\lambda=0.01) IN CASE 4.
(a) Deadlock detection time.

nn	p=0.100p=0.100	p=0.010p=0.010	p=0.001p=0.001	p=0.000p=0.000
500	0.824136	0.824172	0.824258	0.824388
1000	0.531728	0.531742	0.531765	0.531812
5000	0.214682	0.214684	0.214690	0.214699
10000	0.148546	0.148548	0.148551	0.148554

(b) Long-run average message complexity per unit time.

nn	p=0.100p=0.100	p=0.010p=0.010	p=0.001p=0.001	p=0.000p=0.000
500	7.02345E+057.02345 \mathrm{E}+05	7.02134E+057.02134 \mathrm{E}+05	7.01953E+057.01953 \mathrm{E}+05	7.01871E+057.01871 \mathrm{E}+05
1000	4.12645E+064.12645 \mathrm{E}+06	4.12583E+064.12583 \mathrm{E}+06	4.12523E+064.12523 \mathrm{E}+06	4.12460E+064.12460 \mathrm{E}+06
5000	2.41631E+082.41631 \mathrm{E}+08	2.41586E+082.41586 \mathrm{E}+08	2.41532E+082.41532 \mathrm{E}+08	2.41520E+082.41520 \mathrm{E}+08
10000	1.38064E+091.38064 \mathrm{E}+09	1.38060E+091.38060 \mathrm{E}+09	1.38051E+091.38051 \mathrm{E}+09	1.38047E+091.38047 \mathrm{E}+09

troducing the message-complexities of the deadlock detection and resolution algorithms being used, we have investigated the asymptotically optimal frequency of deadlock detection scheduling in terms of the number of distributed processes through the well-known Landau notation. The analytical results were direct extensions of Ling et al.'s work, but could give the reality to the deadlock detection and resolution scheduling subject to the occurrence of system failures. We have also shown that the number of distributed processes and the system failure probability give an great effect to the long-run average message-complexity per unit time, but not the deadlock scheduling time. This lesson is also quite important, because there is no work to connect between the message complexity of distributed systems and the system failure phenomenon in past.

In the future, we plan to extend the resulting stochastic models further more from the view point of different cost criteria. For instance, though the present models took the presence of system failures into account, the main objective was to minimize the message-complexity and not to optimize the distributed deadlock detection scheduling in terms of reliability and safety. Also, by using the rate of deadlock formation mea-
sured through experiments, the resulting scheduling algorithms should be implemented in accordance with varying the number of distributed processes. Such an adaptive scheduler should be developed to realize dependable distributed systems monitoring with effective deadlock detection/resolution algorithms.

Acknowledgments This research was partially supported by the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Scientific Research ©, Grant No. 21510167 (2009-2012), and the Research Program 2010 under the Center for Academic Development, and Cooperation of the Hiroshima Shudo University, Japan.

REFERENCES

[1] G. Bracha and S. Toueg, Distributed deadlock detection, Distributed Computing, 2, 127-138, 1987.
[2] K. M. Chandy, J. Misra and L. M. Haas, Distributed deadlock detection, ACM Transactions on Computer Systems, 1 (2), 144-156, 1983.
[3] S. Chen and Y. Ling, Stochastic analysis of distributed deadlock scheduling, Proceedings of the 24th Annual ACM symposium on Principles of Distributed Computing, 265-273, ACM, 2005.
[4] J. R. G. de Mendivil, J. R. Gariragoitia, C. F. Alastruey, and J. M. Bernabeu-Auban, A distributed deadlock resolution algorithm for the AND model, IEEE Transactions on Parallel and Distributed Systems, 10 (5), 433-447, 1999.
[5] R. C. Holt, Some deadlock properties of computer systems, ACM Computing Surveys, 4 (3), 179-196, 1972.
[6] E. Knapp, Deadlock detection in distributed databases, ACM Computing Surveys, 19 (4), 303-328, 1987.
[7] N. Krivokapi, A. Kemper and E. Gudes, Deadlock detection in distributed database systems: a new algorithm and a comparative performance analysis, The International Journal on Very Large Data Bases, 8 (2), 79-100, 1999.
[8] A. D. Kshenkalyani and M. Singhal, A one-phase algorithm to detect distributed deadlock in replicated databases, IEEE Transactions on Knowledge and Data Engineering, 11 (6), 880-895, 1999.
[9] S. Lee and J. L. Kim, Performance analysis of distributed deadlock detection algorithms, IEEE Transactions on Knowledge and Data Engineering, 13 (4), 623-636, 2001.
[10] S. Lee, Centralized detection and resolution of distributed deadlocks in the generalized model, IEEE Transactions on Software Engineering, 30 (8), 561-573, 2004.
[11] Y. Ling, S. Chen, and C. J. Chiang, On optimal deadlock detection scheduling, IEEE Transactions on Computers, 55 (9), 1178-1187, 2006.
[12] M. J. Merritt, A distributed algorithm for deadlock detection and resolution, Proceedings of the 3rd Annual ACM symposium on Principles of Distributed Computing, 282-284, ACM, 1984.
[13] R. Obermarck, Distributed deadlock detection algorithm, ACM Transactions on Database Systems, 7 (2), 187-208, 1982.
[14] Y. C. Park, P. Scheuermann and S. H. Lee, A periodic deadlock detection and resolution algorithm with a new graph model for sequential transaction processing, Proceedings of the 8th International Conference on Data Engineering, IEEE CS, 202-209, 1992.
[15] M. Roesler and W. A. Burkhard, Semantic lock models in objectoriented distributed systems and deadlock resolution, Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data, 361-370, ACM, 1988.
[16] M. Slinghal, Deadlock detection in distributed systems, IEEE Computer, 22 (11), 37-48, 1989.
[17] I. Terekhov and T. Camp, Time efficient deadlock resolution algorithms, Information Processing Letters, 69 (3), 149-154, 1999.
[18] S. Warnakulanariya and T. M. Puskston, A formal model of message blocking and deadlock resolution in interconnection networks, IEEE Transactions on Parallel and Distributed Systems, 11 (3), 212-229, 2000 .

Deadlock Detection Scheduling for Distributed Processes in the Presence of System Failures (original) (raw)