Reliability Analysis in Parallel and Distributed Systems with Network Contention (original) (raw)

This paper tackles the reliability problem of task allocation in heterogeneous distributed systems in the presence of network contention. A large number of scheduling heuristics has been presented in literature, but most of them target maximizing the system reliability without taking network contention delay into consideration. In this paper, we deal with a more realistic model for heterogeneous networks of workstations by taking network contention as an important factor in our study. Although network contention is not considered in task scheduling, yet it has a great effect on the execution time of a parallel program. In our work, we rely on the hybrid algorithm investigated in [8] but with a new system model that allows us to capture network contention. We first develop a mathematical model for reliability based on the unreliability cost function caused by the execution of tasks on the system processors and by the inter-processor communication link where network contention caused ...