The impact of software metrics in NASA metric data program dataset modules for software defect prediction (original) (raw)
Related papers
An empirical study on software defect prediction with a simplified metric set
Context: Software defect prediction plays a crucial role in estimating the most defect-prone components of software, and a large number of studies have pursued improving prediction accuracy within a project or across projects. However, the rules for making an appropriate decision between within-and cross-project defect prediction when available historical data are insufficient remain unclear. Objective: The objective of this work is to validate the feasibility of the predictor built with a simplified metric set for software defect prediction in different scenarios, and to investigate practical guidelines for the choice of training data, classifier and metric subset of a given project. Method: First, based on six typical classifiers, three types of predictors using the size of software metric set were constructed in three scenarios. Then, we validated the acceptable performance of the predictor based on Top-k metrics in terms of statistical methods. Finally, we attempted to minimize the Top-k metric subset by removing redundant metrics, and we tested the stability of such a minimum metric subset with one-way ANOVA tests. Results: The study has been conducted on 34 releases of 10 open-source projects available at the PROM-ISE repository. The findings indicate that the predictors built with either Top-k metrics or the minimum metric subset can provide an acceptable result compared with benchmark predictors. The guideline for choosing a suitable simplified metric set in different scenarios is presented in . Conclusion: The experimental results indicate that (1) the choice of training data for defect prediction should depend on the specific requirement of accuracy; (2) the predictor built with a simplified metric set works well and is very useful in case limited resources are supplied; (3) simple classifiers (e.g., Naïve Bayes) also tend to perform well when using a simplified metric set for defect prediction; and (4) in several cases, the minimum metric subset can be identified to facilitate the procedure of general defect prediction with acceptable loss of prediction precision in practice.
An Open-Source Software Metric Tool for Defect Prediction, Its Case Study and Lessons We Learned
The number of research papers on defect prediction has sharply increased for the last decade or so. One of the main driving forces behind it has been the publicly available datasets for defect prediction such as the PROMISE repository. These publicly available datasets make it possible for numerous researchers to conduct various experiments on defect prediction without having to collect data themselves. However, there are potential problems that have been ignored. First, there is a potential risk that the knowledge accumulated in the research community is, over time, likely to overfit to the datasets that are repeatedly used in numerous studies. Second, as software development practices commonly employed in the field evolve over time, these changes may potentially affect the relation between defect-proneness and software metrics, which would not be reflected in the existing datasets. In fact, these potential risks can be addressed to a significant degree, if new datasets can be prepared easily. As a step toward that goal, we introduce an open-source software metric tool, SMD (Software Metric tool for Defect prediction) that can generate code metrics and process metrics for a given Java software project in a Git repository. In our case study where we compare existing datasets with the datasets regenerated from the same software projects using our tool, we found that the two datasets are not identical with each other, despite the fact that the metric values we obtained conform to the definitions of their corresponding metrics. We learned that there are subtle factors to consider when generating and using metrics for defect prediction.
Software Defect Prediction Framework Using Hybrid Software Metric
JOIV : International Journal on Informatics Visualization
Software fault prediction is widely used in the software development industry. Moreover, software development has accelerated significantly during this epidemic. However, the main problem is that most fault prediction models disregard object-oriented metrics, and even academician researcher concentrate on predicting software problems early in the development process. This research highlights a procedure that includes an object-oriented metric to predict the software fault at the class level and feature selection techniques to assess the effectiveness of the machine learning algorithm to predict the software fault. This research aims to assess the effectiveness of software fault prediction using feature selection techniques. In the present work, software metric has been used in defect prediction. Feature selection techniques were included for selecting the best feature from the dataset. The results show that process metric had slightly better accuracy than the code metric.
A Review on Software Defect Prediction Techniques Using Product Metrics
International Journal of Database Theory and Application, 2017
Presently, complexity and volume of software systems are increasing with a rapid rate. In some cases it improves performance and brings efficient outcome, but unfortunately in several situations it leads to elevated cost for testing, meaningless outcome and inferior quality, even there is no trustworthiness of the products. Fault prediction in software plays a vital role in enhancing the software excellence as well as it helps in software testing to decrease the price and time. Conventionally, to describe the difficulty and calculate the duration of the programming, software metrics can be utilized. To forecast the amount of faults in module and utilizing software metrics, an extensive investigation is performed. With the purpose of recognizing the causes which importantly enhances the fault prediction models related to product metrics, this empirical research is made. This paper visits various software metrics and suggested procedures through which software defect prediction is enhanced and also summarizes those techniques.
Software quality is a field of study and practice that describes the desirable attributes of software products. The performance must be perfect without any defects.Software quality metrics are a subset of software metrics that focus on the quality aspects of the product, process, and project.The software defectprediction model helps in early detection of defects and contributes to their efficient removal and producing a quality software system based on several metrics. The main objective of paper is to help developers identify defects based on existing software metrics using data mining techniques and thereby improve the software quality.In this paper, variousclassification techniquesare revisitedwhich are employed for software defect prediction using software metrics in the literature.
Object-Oriented Metrics for Defect Prediction
2019
Today, defect prediction is an important part of software industry to meet deadlines for their products. Defect prediction techniques help the organizations to use their resources effectively which results in lower cost and time requirements. Various metrics are used for defect prediction in within company (WC) and cross-company (CC) projects. In this paper, we used object-oriented metrics to build a defect prediction model for within company and cross-company projects. In this paper, feed-forward neural network (FFNN) model is used to build a defect prediction model. The proposed model was tested over four datasets against within company defect prediction (WCDP) and cross-company defect prediction (CCDP). The proposed model gives good results for WCDP and CCDP as compared to previous studies.
A Comparison Framework of Classification Models for Software Defect Prediction
Software defects are expensive in quality and cost. The accurate prediction of defect-prone software modules can help direct test effort, reduce costs, and improve the quality of software. Machine learning classification algorithms is a popular approach for predicting software defect. Various types of classification algorithms have been applied for software defect prediction. However, no clear consensus on which algorithm perform best when individual studies are looked at separately. In this research, a comparison framework is proposed, which aims to benchmark the performance of a wide range of classification models within the field of software defect prediction. For the purpose of this study, 10 classifiers are selected and applied to build classification models and test their performance in 9 NASA MDP datasets. Area under curve (AUC) is employed as an accuracy indicator in our framework to evaluate the performance of classifiers. Friedman and Nemenyi post hoc tests are used to test for significance of AUC differences between classifiers. The results show that the logistic regression perform best in most NASA MDP datasets. Naive bayes, neural network, support vector machine and k* classifiers also perform well. Decision tree based classifiers tend to underperform, as well as linear discriminant analysis and k-nearest neighbor.
Towards Developing and Analysing Metric-Based Software Defect Severity Prediction Model
Cornell University - arXiv, 2022
In a critical software system, the testers have to spend an enormous amount of time and effort to maintain the software due to the continuous occurrence of defects. Among such defects, some severe defects may adversely affect the software. To reduce the time and effort of a tester, many machine learning models have been proposed in the literature, which use the documented defect reports to automatically predict the severity of the defective software modules. In contrast to the traditional approaches, in this work we propose a metric-based software defect severity prediction (SDSP) model that uses a self-training semi-supervised learning approach to classify the severity of the defective software modules. The approach is constructed on a mixture of unlabelled and labelled defect severity data. The self-training works on the basis of a decision tree classifier to assign the pseudo-class labels to the unlabelled instances. The predictions are promising since the self-training successfully assigns the suitable class labels to the unlabelled instances. On the other hand, numerous research studies have covered proposing prediction approaches as well as the methodological aspects of defect severity prediction models, the gap in estimating project attributes from the prediction model remains unresolved. To bridge the gap, we propose five project specific measures such as the Risk-Factor (RF), the Percent of Saved Budget (PSB), the Loss in the Saved Budget (LSB), the Remaining Service Time (RST) and Gratuitous Service Time (GST) to capture project outcomes from the predictions. Similar to the traditional measures, these measures are also calculated from the observed confusion matrix. These measures are used to analyse the impact that the prediction model has on the software project.
Predicting Software Faults Using Software Metrics: A Review
This paper presents a survey of different techniques used to predict software faults. It studies various techniques, their advantages and limitations in predicting the software defects. Software metrics find essence in predicting software defects and thus enhancing the quality of a software, while keeping the costs and efforts to minimal. As there has been a gap between academy and industry in this field, a technique is examined for bridging the gap by studying practical implementation that are feasible for the industry. Various papers have been studied and various methods proposed have been analyzed for their possible shortcomings and enhancements. The results and the findings of various authors are channelized to examine the problems and their possible solutions. The approach of Bayesian Belief Nets is discussed at the end of the paper as a candidate for improved decision making in the industry whenever uncertainty reigns over decision making.