An Empirical Analysis on Effectiveness of Source Code Metrics for Aging Related Bug Prediction

Predicting Aging-Related Bugs using Software Complexity Metrics

Long-running software systems tend to show degraded performance and an increased failure occurrence rate. This problem, known as Software Aging, which is typically related to the runtime accumulation of error conditions, is caused by the activation of the so-called Aging-Related Bugs (ARBs). This paper aims to predict the location of Aging-Related Bugs in complex software systems, so as to aid their identification during testing. First, we carried out a bug data analysis on three large software projects in order to collect data about ARBs. Then, a set of software complexity metrics were selected and extracted from the three projects. Finally, by using such metrics as predictor variables and machine learning algorithms, we built fault prediction models that can be used to predict which source code files are more prone to Aging-Related Bugs. software fault injection, dependability assessment techniques, and field-based measurements techniques. Domenico Cotroneo has served as Program Committee member in a number of scientific conferences on dependability topics, including DSN, EDCC, ISSRE, SRDS, and LADC, and he is involved in several national and European projects in the context of dependable systems.

Analysis of Ensemble Models for Aging Related Bug Prediction in Software Systems

Proceedings of the 13th International Conference on Software Technologies, 2018

With the evolution of the software industry, the growing software complexity led to the increase in the number of software faults. According to the study, the software faults are responsible for many unplanned system outages and affects the reputation of the company. Many techniques are proposed in order to avoid the software failures but still software failures are common. Many software faults and failures are outcomes of a phenomenon, called software aging. In this work, we have presented the use of various ensemble models for development of approach to predict the Aging Related Bugs (ARB). A comparative analysis of different ensemble techniques, bagging, boosting and stacking have been presented with their comparison with the base learning techniques which has not been explored in the prediction of ARBs. The experimental study has been performed on the LINUX and MYSQL bug datasets collected from Software Aging and Rejuvenation Repository.

Predicting software aging related bugs from imbalanced datasets by using data mining techniques

Software aging bugs are related with the lifespan of the software. Rebooting is one of the solutions of this problem, however, it is time consuming and causes resources loss. It is difficult to detect these bugs during the time-limited software testing process. Data mining techniques can be useful to predict whether a piece of software has aging related bugs or not. The available datasets of software aging bugs present a challenge as they are imbalanced datasets. In these datasets, the number of data points with bugs is very small as compared to the number of data points with no bugs. It is important to predict the rare class (Bugs). In this paper we carried out experiment with a dataset containing data points related to aging-related bugs found in an open-source project MySQL DBMS. Data mining techniques developed for imbalanced datasets were compared with general data mining techniques. Various performance measures were used for the comparative study. The results suggest that data mining techniques developed for imbalanced datasets are more useful for correct prediction of data points related to aging related bugs. Data mining techniques developed for imbalanced datasets performed better than general data mining techniques on G-mean measure which is an important performance measure for imbalanced datasets.

An empirical analysis of the effectiveness of software metrics and fault prediction model for identifying faulty classes

Computer Standards & Interfaces

Software fault prediction models are used to predict faulty modules at the very early stage of software development life cycle. Predicting fault proneness using source code metrics is an area that has attracted several researchers' attention. The performance of a model to assess fault proneness depends on the source code metrics which are considered as the input for the model. In this work, we have proposed a framework to validate the source code metrics and identify a suitable set of source code metrics with the aim to reduce irrelevant features and improve the performance of the fault prediction model. Initially, we applied a t-test analysis and univariate logistic regression analysis to each source code metric to evaluate their potential for predicting fault proneness. Next, we performed a correlation analysis and multivariate linear regression stepwise forward selection to find the right set of source code metrics for fault prediction. The obtained set of source code metrics are considered as the input to develop a fault prediction model using a neural network with five different training algorithms and three different ensemble methods. The effectiveness of the developed fault prediction models are evaluated using a proposed cost evaluation framework. We performed experiments on fifty six Open Source Java projects. The experimental results reveal that the model developed by considering the selected set of source code metrics using the suggested source code metrics validation framework as the input achieves better results compared to all other metrics. The experimental results also demonstrate that the fault prediction model is best suitable for projects with faulty classes less than the threshold value depending on fault identification efficiency (low-48.89%, median-39.26%, and high-27.86%).

A Systematic Differential Analysis for Fast and Robust Detection of Software Aging

2014 IEEE 33rd International Symposium on Reliable Distributed Systems, 2014

Software systems running continuously for a long time often confront software aging, which is the phenomenon of progressive degradation of execution environment caused by latent software faults. Removal of such faults in software development process is a crucial issue for system reliability. A known major obstacle is typically the large latency to discover the existence of software aging. We propose a systematic approach to detect software aging which has shorter test time and higher accuracy compared to traditional aging detection via stress testing and trend detection. The approach is based on a differential analysis where a software version under test is compared against a previous version in terms of behavioral changes of resource metrics. A key instrument adopted is a divergence chart, which expresses time-dependent differences between two signals. Our experimental study focuses on memory-leak detection and evaluates divergence charts computed using multiple statistical techniques paired with application-level memory related metrics (RSS and Heap Usage). The results show that the proposed method achieves good performance for memory-leak detection in comparison to techniques widely adopted in previous works (e.g., linear regression, moving average and median).

A methodology for detection and estimation of software aging

1998

The phenomenon of software aging refers to the accumulation of errors during the execution of the software which eventually results in it's crash/hang failure. A gradual performance degradation may also accompany software aging. Pro-active fault management techniques such as “software rejuvenation” (Y. Huang et al., 1995) may be used to counteract aging if it exists. We propose a methodology for detection and estimation of aging in the UNIX operating system. First, we present the design and implementation of an SNMP based, distributed monitoring tool used to collect operating system resource usage and system activity data at regular intervals, from networked UNIX workstations. Statistical trend detection techniques are applied to this data to detect/validate the existence of aging. For quantifying the effect of aging in operating system resources, we propose a metric: “estimated time to exhaustion”, which is calculated using well known slope estimation techniques. Although the distributed data collection tool is specific to UNIX, the statistical techniques can be used for detection and estimation of aging in other software as well

Is software aging related to software metrics

2010

This work presents an empirical analysis aiming at investigating what kind of relationship exists between software aging and several static features of the software. While past studies on software aging focused on predicting the aging effects by monitoring and analytically modeling resource consumption at runtime, this study intends to explore if the static features of the software, as derived by its source code, presents potential relationships with software aging. We adopt a set of common software metrics concerning program structure, such as size and cyclomatic complexity, along with some features specifically developed for this study; metrics were then computed from ten complex software applications affected by aging. A statistical analysis to infer their relationship with software aging was carried out. Results encourage further investigations in this direction, since they show that software aging effects are related to the static features of software.

Software ageing measurement and classification using Goal Question Metric (GQM) approach

As indicated by earlier researches, software products behave in a manner that is almost similar to the process of human ageing. Just as human beings get old, it is believed that software too can age, and even though ageing is unavoidable, it is possible to understand the reasons for its occurrence so that steps can be taken to control its impact. Being a logical product, software does not age in the physical sense. However, in certain situations, the software loses its significance and quality to its surroundings. This occurrence can be compared to the process of ageing. The software ageing function can be expressed in terms of its significance, frequency of failure, technology, costs and so on. The software can be revived and the process of ageing postponed if these factors can be ascertained and distinguished. Inspired by earlier works concerning software certification and value, we have been guided to develop a software ageing model together with its associated links such as agei...

Software aging predictiona new approach

International Journal of Electrical and Computer Engineering (IJECE), 2023

To meet the users' requirements which are very diverse in recent days, computing infrastructure has become complex. An example of one such infrastructure is a cloud-based system. These systems suffer from resource exhaustion in the long run which leads to performance degradation. This phenomenon is called software aging. There is a need to predict software aging to carry out pre-emptive rejuvenation that enhances service availability. Software rejuvenation is the technique that refreshes the system and brings it back to a healthy state. Hence, software aging should be predicted in advance to trigger the rejuvenation process to improve service availability. In this work, the k-nearest neighbor (k-NN) algorithm-based new approach has been used to identify the virtual machine's status, and a prediction of resource exhaustion time has been made. The proposed prediction model uses static thresholding and adaptive thresholding methods. The performance of the algorithms is compared, and it is found that for classification, the k-NN performs comparatively better, i.e., k-NN showed an accuracy of 97.6. In contrast, its counterparts performed with an accuracy of 96.0 (naïve Bayes) and 92.8 (decision tree). The comparison of the proposed work with previous similar works has also been discussed.

Using Source Code Metrics and Ensemble Methods for Fault Proneness Prediction

2017

Software fault prediction model are employed to optimize testing resource allocation by identifying fault-prone classes before testing phases. Several researchers' have validated the use of different classification techniques to develop predictive models for fault prediction. The performance of the statistical models are proven to be influenced by the training and testing dataset. Ensemble method learning algorithms have been widely used because it combines the capabilities of its constituent models towards a dataset to come up with a potentially higher performance as compared to individual models (improves generalizability). In the study presented in this paper, three different ensemble methods have been applied to develop a model for predicting fault proneness. The efficacy and usefulness of a fault prediction model also depends on the source code metrics which are considered as the input for the model. In this paper, we propose a framework to validate the source code metrics ...

An Empirical Analysis on Effectiveness of Source Code Metrics for Aging Related Bug Prediction (original) (raw)

Related papers