Iman J Yusuf | University of Somalia (original) (raw)

Papers by Iman J Yusuf

Research paper thumbnail of Chiminey: Reliable Computing and Data Management Platform in the Cloud

2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, 2015

The enabling of scientific experiments that are embarrassingly parallel, long running and data-in... more The enabling of scientific experiments that are embarrassingly parallel, long running and data-intensive into a cloud-based execution environment is a desirable, though complex undertaking for many researchers. The management of such virtual environments is cumbersome and not necessarily within the core skill set for scientists and engineers. We present here Chiminey, a software platform that enables researchers to (i) run applications on both traditional highperformance computing and cloud-based computing infrastructures, (ii) handle failure during execution, (iii) curate and visualise execution outputs, (iv) share such data with collaborators or the public, and (v) search for publicly available data.

Research paper thumbnail of Metamorphic fault tolerance: an automated and systematic methodology for fault tolerance in the absence of test oracle

Companion Proceedings of the 36th International Conference on Software Engineering, 2014

A system may fail due to an internal bug or a fault in its execution environment. Incorporating f... more A system may fail due to an internal bug or a fault in its execution environment. Incorporating fault tolerance strategies enables such system to complete its function despite the failure of some of its parts. Prior to the execution of some fault tolerance strategies, failure detection is needed. Detecting incorrect output, for instance, assumes the existence of an oracle to check the correctness of program outputs given an input. However, in many practical situations, oracle does not exist or is extremely difficult to apply. Such an oracle problem is a major challenge in the context of software testing. In this paper, we propose to apply metamorphic testing, a software testing method that alleviates the oracle problem, into fault tolerance. The proposed technique supports failure detection without the need of oracles.

Research paper thumbnail of Evaluating recovery aware components for grid reliability

Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering on European software engineering conference and foundations of software engineering symposium - ESEC/FSE '09, 2009

Failure in grids is costly and inevitable. Existing fault tolerance (FT) mechanisms are typically... more Failure in grids is costly and inevitable. Existing fault tolerance (FT) mechanisms are typically defensive and reactive, thus unnecessarily costly. In this paper we propose a hybrid FT approach, recovery aware component (RAC), combining reactive and proactive FT, with failure recovery or aversion of user-defined granularity, by component-orientation and architecture-level reasoning about FT, to increase reliability and availability without needless performance sacrifices. We model and analyse a parameterised RAC implementation combining prediction, proactive rejuvenation and reactive restarting to varying extents, calculating cost savings, reliability improvements and cost-benefit, under parameters such as prediction frequency and accuracy.

Research paper thumbnail of Parameterised architectural patterns for providing cloud service fault tolerance with accurate costings

Proceedings of the 16th International ACM Sigsoft symposium on Component-based software engineering - CBSE '13, 2013

ABSTRACT Cloud computing presents a unique opportunity for science and engineering with benefits ... more ABSTRACT Cloud computing presents a unique opportunity for science and engineering with benefits compared to traditional high-performance computing, especially for smaller compute jobs and entry-level users to parallel computing. However, doubts remain for production high-performance computing in the cloud, the so-called science cloud, as predictable performance, reliability and therefore costs remain elusive for many applications. This paper uses parameterised architectural patterns to assist with fault tolerance and cost predictions for science clouds, in which a single job typically holds many virtual machines for a long time, communication can involve massive data movements, and buffered streams allow parallel processing to proceed while data transfers are still incomplete. We utilise predictive models, simulation and actual runs to estimate run times with acceptable accuracy for two of the most common architectural patterns for data-intensive scientific computing: MapReduce and Combinational Logic. Run times are fundamental to understand fee-for-service costs of clouds. These are typically charged by the hour and the number of compute nodes or cores used. We evaluate our models using realistic cloud experiments from collaborative physics research projects and show that proactive and reactive fault tolerance is manageable, predictable and composable, in principle, especially at the architectural level.

Research paper thumbnail of Impact of Nanoscale Roughness of Titanium Thin Film Surfaces on Bacterial Retention

Langmuir, 2010

Two human pathogenic bacteria, Staphylococcus aureus CIP 68.5 and Pseudomonas aeruginosa ATCC 902... more Two human pathogenic bacteria, Staphylococcus aureus CIP 68.5 and Pseudomonas aeruginosa ATCC 9025, were adsorbed onto surfaces containing Ti thin films of varying thickness to determine the extent to which nanoscale surface roughness influences the extent of bacterial attachment. A magnetron sputter thin film system was used to deposit titanium films with thicknesses of 3, 12, and 150 nm on glass substrata with corresponding surface roughness parameters of R q 1.6, 1.2, and 0.7 nm (on a 4 μm  4 μm scanning area). The chemical composition, wettability, and surface architecture of titanium thin films were characterized using X-ray photoelectron spectroscopy, contact angle measurements, atomic force microscopy, three-dimensional interactive visualization, and statistical approximation of the topographic profiles. Investigation of the dynamic evolution of the Ti thin film topographic parameters indicated that three commonly used parameters, R a , R q , and R max , were insufficient to effectively characterize the nanoscale rough/smooth surfaces. Two additional parameters, R skw and R kur , which describe the statistical distributions of roughness character, were found to be useful for evaluating the surface architecture. Analysis of bacterial retention profiles indicated that bacteria responded differently to the surfaces on a scale of less than 1 nm change in the R a and R q Ti thin film surface roughness parameters by (i) an increased number of retained cells by a factor of 2-3, and (ii) an elevated level of secretion of extracellular polymeric substances.

Research paper thumbnail of Architecture-based fault tolerance support for grid applications

Proceedings of the joint ACM SIGSOFT conference -- QoSA and ACM SIGSOFT symposium -- ISARCS on Quality of software architectures -- QoSA and architecting critical systems -- ISARCS - QoSA-ISARCS '11, 2011

ABSTRACT Failure in long running grid applications is arguably inevitable and costly. Therefore, ... more ABSTRACT Failure in long running grid applications is arguably inevitable and costly. Therefore, fault tolerance (FT) support for grid applications is needed. This paper evaluates an extension of our prior work on Recovery Aware Components (RAC), a component based FT approach. Our extension utilizes the grid application architecture according to a small number of architectural classes. In this paper, we evaluate the MapReduce architecture only and analyze the reliability improvement MapReduce applications would gain by adopting the RAC approach. Our analysis shows that significant increases in reliability are possible at moderate extra cost. Obviously the cost of FT depends on the failure rate of the managed system, i.e., the system to be protected from faults, and the FT strategy chosen. Our work aims to give High Performance Computing (HPC) software architects the tools to control these factors for dierent grid application architectures.

Research paper thumbnail of Chiminey: Reliable Computing and Data Management Platform in the Cloud

2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, 2015

The enabling of scientific experiments that are embarrassingly parallel, long running and data-in... more The enabling of scientific experiments that are embarrassingly parallel, long running and data-intensive into a cloud-based execution environment is a desirable, though complex undertaking for many researchers. The management of such virtual environments is cumbersome and not necessarily within the core skill set for scientists and engineers. We present here Chiminey, a software platform that enables researchers to (i) run applications on both traditional highperformance computing and cloud-based computing infrastructures, (ii) handle failure during execution, (iii) curate and visualise execution outputs, (iv) share such data with collaborators or the public, and (v) search for publicly available data.

Research paper thumbnail of Metamorphic fault tolerance: an automated and systematic methodology for fault tolerance in the absence of test oracle

Companion Proceedings of the 36th International Conference on Software Engineering, 2014

A system may fail due to an internal bug or a fault in its execution environment. Incorporating f... more A system may fail due to an internal bug or a fault in its execution environment. Incorporating fault tolerance strategies enables such system to complete its function despite the failure of some of its parts. Prior to the execution of some fault tolerance strategies, failure detection is needed. Detecting incorrect output, for instance, assumes the existence of an oracle to check the correctness of program outputs given an input. However, in many practical situations, oracle does not exist or is extremely difficult to apply. Such an oracle problem is a major challenge in the context of software testing. In this paper, we propose to apply metamorphic testing, a software testing method that alleviates the oracle problem, into fault tolerance. The proposed technique supports failure detection without the need of oracles.

Research paper thumbnail of Evaluating recovery aware components for grid reliability

Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering on European software engineering conference and foundations of software engineering symposium - ESEC/FSE '09, 2009

Failure in grids is costly and inevitable. Existing fault tolerance (FT) mechanisms are typically... more Failure in grids is costly and inevitable. Existing fault tolerance (FT) mechanisms are typically defensive and reactive, thus unnecessarily costly. In this paper we propose a hybrid FT approach, recovery aware component (RAC), combining reactive and proactive FT, with failure recovery or aversion of user-defined granularity, by component-orientation and architecture-level reasoning about FT, to increase reliability and availability without needless performance sacrifices. We model and analyse a parameterised RAC implementation combining prediction, proactive rejuvenation and reactive restarting to varying extents, calculating cost savings, reliability improvements and cost-benefit, under parameters such as prediction frequency and accuracy.

Research paper thumbnail of Parameterised architectural patterns for providing cloud service fault tolerance with accurate costings

Proceedings of the 16th International ACM Sigsoft symposium on Component-based software engineering - CBSE '13, 2013

ABSTRACT Cloud computing presents a unique opportunity for science and engineering with benefits ... more ABSTRACT Cloud computing presents a unique opportunity for science and engineering with benefits compared to traditional high-performance computing, especially for smaller compute jobs and entry-level users to parallel computing. However, doubts remain for production high-performance computing in the cloud, the so-called science cloud, as predictable performance, reliability and therefore costs remain elusive for many applications. This paper uses parameterised architectural patterns to assist with fault tolerance and cost predictions for science clouds, in which a single job typically holds many virtual machines for a long time, communication can involve massive data movements, and buffered streams allow parallel processing to proceed while data transfers are still incomplete. We utilise predictive models, simulation and actual runs to estimate run times with acceptable accuracy for two of the most common architectural patterns for data-intensive scientific computing: MapReduce and Combinational Logic. Run times are fundamental to understand fee-for-service costs of clouds. These are typically charged by the hour and the number of compute nodes or cores used. We evaluate our models using realistic cloud experiments from collaborative physics research projects and show that proactive and reactive fault tolerance is manageable, predictable and composable, in principle, especially at the architectural level.

Research paper thumbnail of Impact of Nanoscale Roughness of Titanium Thin Film Surfaces on Bacterial Retention

Langmuir, 2010

Two human pathogenic bacteria, Staphylococcus aureus CIP 68.5 and Pseudomonas aeruginosa ATCC 902... more Two human pathogenic bacteria, Staphylococcus aureus CIP 68.5 and Pseudomonas aeruginosa ATCC 9025, were adsorbed onto surfaces containing Ti thin films of varying thickness to determine the extent to which nanoscale surface roughness influences the extent of bacterial attachment. A magnetron sputter thin film system was used to deposit titanium films with thicknesses of 3, 12, and 150 nm on glass substrata with corresponding surface roughness parameters of R q 1.6, 1.2, and 0.7 nm (on a 4 μm  4 μm scanning area). The chemical composition, wettability, and surface architecture of titanium thin films were characterized using X-ray photoelectron spectroscopy, contact angle measurements, atomic force microscopy, three-dimensional interactive visualization, and statistical approximation of the topographic profiles. Investigation of the dynamic evolution of the Ti thin film topographic parameters indicated that three commonly used parameters, R a , R q , and R max , were insufficient to effectively characterize the nanoscale rough/smooth surfaces. Two additional parameters, R skw and R kur , which describe the statistical distributions of roughness character, were found to be useful for evaluating the surface architecture. Analysis of bacterial retention profiles indicated that bacteria responded differently to the surfaces on a scale of less than 1 nm change in the R a and R q Ti thin film surface roughness parameters by (i) an increased number of retained cells by a factor of 2-3, and (ii) an elevated level of secretion of extracellular polymeric substances.

Research paper thumbnail of Architecture-based fault tolerance support for grid applications

Proceedings of the joint ACM SIGSOFT conference -- QoSA and ACM SIGSOFT symposium -- ISARCS on Quality of software architectures -- QoSA and architecting critical systems -- ISARCS - QoSA-ISARCS '11, 2011

ABSTRACT Failure in long running grid applications is arguably inevitable and costly. Therefore, ... more ABSTRACT Failure in long running grid applications is arguably inevitable and costly. Therefore, fault tolerance (FT) support for grid applications is needed. This paper evaluates an extension of our prior work on Recovery Aware Components (RAC), a component based FT approach. Our extension utilizes the grid application architecture according to a small number of architectural classes. In this paper, we evaluate the MapReduce architecture only and analyze the reliability improvement MapReduce applications would gain by adopting the RAC approach. Our analysis shows that significant increases in reliability are possible at moderate extra cost. Obviously the cost of FT depends on the failure rate of the managed system, i.e., the system to be protected from faults, and the FT strategy chosen. Our work aims to give High Performance Computing (HPC) software architects the tools to control these factors for dierent grid application architectures.