Software Defect Prediction Using Static Code Metrics : Formulating a Methodology (original) (raw)

Evaluating defect prediction approaches: a benchmark and an extensive comparison

Empirical Software Engineering, 2012

Reliably predicting software defects is one of the holy grails of software engineering. Researchers have devised and implemented a plethora of defect/bug prediction approaches varying in terms of accuracy, complexity and the input data they require. However, the absence of an established benchmark makes it hard, if not impossible, to compare approaches.

A general software defect-proneness prediction framework

Software Engineering, …, 2011

BACKGROUND-Predicting defect-prone software components is an economically important activity and so has received a good deal of attention. However, making sense of the many, and sometimes seemingly inconsistent, results is difficult. OBJECTIVE-We propose and evaluate a general framework for software defect prediction that supports 1) unbiased and 2) comprehensive comparison between competing prediction systems. METHOD-The framework is comprised of 1) scheme evaluation and 2) defect prediction components. The scheme evaluation analyzes the prediction performance of competing learning schemes for given historical data sets. The defect predictor builds models according to the evaluated learning scheme and predicts software defects with new data according to the constructed model. In order to demonstrate the performance of the proposed framework, we use both simulation and publicly available software defect data sets. RESULTS-The results show that we should choose different learning schemes for different data sets (i.e., no scheme dominates), that small details in conducting how evaluations are conducted can completely reverse findings, and last, that our proposed framework is more effective and less prone to bias than previous approaches. CONCLUSIONS-Failure to properly or fully evaluate a learning scheme can be misleading; however, these problems may be overcome by our proposed framework.

Software defect prediction using static code metrics underestimates defect-proneness

International Symposium on Neural Networks, 2010

Many studies have been carried out to predict the presence of software code defects using static code metrics. Such studies typically report how a classifier performs with real world data, but usually no analysis of the predictions is carried out. An analysis of this kind may be worthwhile as it can illuminate the motivation behind the predictions and the severity

An empirical study on software defect prediction with a simplified metric set

Context: Software defect prediction plays a crucial role in estimating the most defect-prone components of software, and a large number of studies have pursued improving prediction accuracy within a project or across projects. However, the rules for making an appropriate decision between within-and cross-project defect prediction when available historical data are insufficient remain unclear. Objective: The objective of this work is to validate the feasibility of the predictor built with a simplified metric set for software defect prediction in different scenarios, and to investigate practical guidelines for the choice of training data, classifier and metric subset of a given project. Method: First, based on six typical classifiers, three types of predictors using the size of software metric set were constructed in three scenarios. Then, we validated the acceptable performance of the predictor based on Top-k metrics in terms of statistical methods. Finally, we attempted to minimize the Top-k metric subset by removing redundant metrics, and we tested the stability of such a minimum metric subset with one-way ANOVA tests. Results: The study has been conducted on 34 releases of 10 open-source projects available at the PROM-ISE repository. The findings indicate that the predictors built with either Top-k metrics or the minimum metric subset can provide an acceptable result compared with benchmark predictors. The guideline for choosing a suitable simplified metric set in different scenarios is presented in . Conclusion: The experimental results indicate that (1) the choice of training data for defect prediction should depend on the specific requirement of accuracy; (2) the predictor built with a simplified metric set works well and is very useful in case limited resources are supplied; (3) simple classifiers (e.g., Naïve Bayes) also tend to perform well when using a simplified metric set for defect prediction; and (4) in several cases, the minimum metric subset can be identified to facilitate the procedure of general defect prediction with acceptable loss of prediction precision in practice.

How repeated data points affect bug prediction performance: A case study

Applied Soft Computing, 2016

In defect prediction studies, open-source and real-world defect data sets are frequently used. The quality of these data sets is one of the main factors affecting the validity of defect prediction methods. One of the issues is repeated data points in defect prediction data sets. The main goal of the paper is to explore how low-level metrics are derived. This paper also presents a cleansing algorithm that removes repeated data points from defect data sets. The method was applied on 20 data sets, including five open source sets, and area under the curve (AUC) and precision performance parameters have been improved by 4.05% and 6.7%, respectively. In addition, this work discusses how static code metrics should be used in bug prediction. The study provides tips to obtain better defect prediction results.

A Critique of Software Defect Prediction Models

IEEE Transactions on Software Engineering, 1999

Many organizations want to predict the number of defects (faults) in software systems, before they are deployed, to gauge the likely delivered quality and maintenance effort. To help in this, numerous software metrics and statistical models have been developed, with a correspondingly large literature. We provide a critical review of this literature and the state-of-the-art. Most of the wide range of prediction models use size and complexity metrics to predict defects. Others are based on testing data, the "quality" of the development process, or take a multivariate approach. The authors of the models have often made heroic contributions to a subject otherwise bereft of empirical studies. However, there are a number of serious theoretical and practical problems in many studies. The models are weak because of their inability to cope with the, as yet, unknown relationship between defects and failures. There are fundamental statistical and data quality problems that undermine model validity. More significantly many prediction models tend to model only part of the underlying problem and seriously misspecify it. To illustrate these points the "Goldilock's Conjecture," that there is an optimum module size, is used to show the considerable problems inherent in current defect prediction approaches. Careful and considered analysis of past and new results shows that the conjecture lacks support and that some models are misleading. We recommend holistic models for software defect prediction, using Bayesian Belief Networks, as alternative approaches to the single-issue models used at present. We also argue for research into a theory of "software decomposition" in order to test hypotheses about defect introduction and help construct a better science of software engineering.

The impact of software metrics in NASA metric data program dataset modules for software defect prediction

TELKOMNIKA Telecommunication Computing Electronics and Control, 2024

This paper discusses software metrics and their impact on software defect prediction values in the NASA metric data program (MDP) dataset. The NASA MDP dataset consists of four categories of software metrics: halstead, McCabe, LoC, and misc. However, there is no study showing which metrics participate in increasing the area under the curve (AUC) value of the NASA MDP dataset. This study utilizes 12 modules from the NASA MDP dataset, where these 12 modules are being tested into 14 relationships of software metrics derived from the four existing metric categories. Subsequently, classification is performed using the k-nearest neighbor (kNN) method. The research concludes that software metrics have a significant impact on the AUC value, with the LoC+McCabe+misc metrics relationship influencing the improvement of the AUC value. However, the metrics relationship that has the most impact on achieving less optimal AUC values is McCabe. Halstead metric also plays a role in decreasing the performance of other metrics.

Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings

IEEE Transactions on Software Engineering, 2000

Software defect prediction strives to improve software quality and testing efficiency by constructing predictive classification models from code attributes to enable a timely identification of fault-prone modules. Several classification models have been evaluated for this task. However, due to inconsistent findings regarding the superiority of one classifier over another and the usefulness of metric-based classification in general, more research is needed to improve convergence across studies and further advance confidence in experimental results. We consider three potential sources for bias: comparing classifiers over one or a small number of proprietary data sets, relying on accuracy indicators that are conceptually inappropriate for software defect prediction and cross-study comparisons, and, finally, limited use of statistical testing procedures to secure empirical findings. To remedy these problems, a framework for comparative software defect prediction experiments is proposed and applied in a large-scale empirical comparison of 22 classifiers over 10 public domain data sets from the NASA Metrics Data repository. Overall, an appealing degree of predictive accuracy is observed, which supports the view that metric-based classification is useful. However, our results indicate that the importance of the particular classification algorithm may be less than previously assumed since no significant performance differences could be detected among the top 17 classifiers.

Improved software defect prediction

2005

Although a number of approaches have been taken to quality prediction for software, none have achieved widespread applicability. This paper describes a single model to combine the diverse forms of, often causal, evidence available in software development in a more natural and efficient way than done previously. We use Bayesian Networks as the appropriate formalism for representing defect introduction, detection and removal processes throughout any life-cycle. The approach combines subjective judgements from experienced project managers and available defect rate data to produce a risk map and use this to forecast and control defect rates. Moreover, the risk map more naturally mirrors real world influences without any distracting mathematical formality. The paper focuses on the extensive validation of the approach within Philips Consumer Electronics (dozens of diverse projects across Philips internationally). The resulting model (packaged within a commercial software tool, AgenaRisk, usable by project managers) is now being used to predict defect rates at various testing and operational phases. The results of the validation confirm that the approach is scalable, robust and more accurate that can be achieved using classical methods. We have found 95% correlation between actual and predicted defects. The defect prediction models incorporate cutting-edge ideas and results from software metrics and process improvement research and package them as risk templates that can either be applied either offthe-shelf or after calibrating them to local conditions and to suit the software development processes in use.