Defect Prediction in Class Level Metric Aggregation Using Data Mining Techniques (original) (raw)

A Study on Software Metrics based Software Defect Prediction using Data Mining and Machine Learning Techniques

Software quality is a field of study and practice that describes the desirable attributes of software products. The performance must be perfect without any defects.Software quality metrics are a subset of software metrics that focus on the quality aspects of the product, process, and project.The software defectprediction model helps in early detection of defects and contributes to their efficient removal and producing a quality software system based on several metrics. The main objective of paper is to help developers identify defects based on existing software metrics using data mining techniques and thereby improve the software quality.In this paper, variousclassification techniquesare revisitedwhich are employed for software defect prediction using software metrics in the literature.

Weighted software metrics aggregation and its application to defect prediction

Empirical Software Engineering

It is a well-known practice in software engineering to aggregate software metrics to assess software artifacts for various purposes, such as their maintainability or their proneness to contain bugs. For different purposes, different metrics might be relevant. However, weighting these software metrics according to their contribution to the respective purpose is a challenging task. Manual approaches based on experts do not scale with the number of metrics. Also, experts get confused if the metrics are not independent, which is rarely the case. Automated approaches based on supervised learning require reliable and generalizable training data, a ground truth, which is rarely available. We propose an automated approach to weighted metrics aggregation that is based on unsupervised learning. It sets metrics scores and their weights based on probability theory and aggregates them. To evaluate the effectiveness, we conducted two empirical studies on defect prediction, one on ca. 200 000 code...

Software Defect Prediction Framework Using Hybrid Software Metric

JOIV : International Journal on Informatics Visualization

Software fault prediction is widely used in the software development industry. Moreover, software development has accelerated significantly during this epidemic. However, the main problem is that most fault prediction models disregard object-oriented metrics, and even academician researcher concentrate on predicting software problems early in the development process. This research highlights a procedure that includes an object-oriented metric to predict the software fault at the class level and feature selection techniques to assess the effectiveness of the machine learning algorithm to predict the software fault. This research aims to assess the effectiveness of software fault prediction using feature selection techniques. In the present work, software metric has been used in defect prediction. Feature selection techniques were included for selecting the best feature from the dataset. The results show that process metric had slightly better accuracy than the code metric.

Investigating the Effect of Software Metrics Aggregation on Software Fault Prediction

2018

In inter-releases software fault prediction, the data from the previous version of the software that is used for training the classifier might not always be of same granularity as that of the testing data. The same scenario may also happen in the cross project software fault prediction. So, one major issue in it can be the difference in granularity i.e. training and testing datasets may not have the metrics at the same level. Thus, there is a need to bring the metrics at the same level. In this paper, aggregation using Average Absolute Deviation (AAD) and Interquartile Range (IQR) are explored. We propose the method for aggregation of metrics from class to package level for software fault prediction and validated the approach by performing experimental analysis. We did the experimental study to analyze the performance of software fault prediction mechanism when no aggregation technique was used and when the two mentioned aggregation techniques were used. The experimental study revealed that the aggregation improved the performance and out of AAD and IQR aggregation techniques, IQR performs relatively better.

An empirical study on software defect prediction with a simplified metric set

Context: Software defect prediction plays a crucial role in estimating the most defect-prone components of software, and a large number of studies have pursued improving prediction accuracy within a project or across projects. However, the rules for making an appropriate decision between within-and cross-project defect prediction when available historical data are insufficient remain unclear. Objective: The objective of this work is to validate the feasibility of the predictor built with a simplified metric set for software defect prediction in different scenarios, and to investigate practical guidelines for the choice of training data, classifier and metric subset of a given project. Method: First, based on six typical classifiers, three types of predictors using the size of software metric set were constructed in three scenarios. Then, we validated the acceptable performance of the predictor based on Top-k metrics in terms of statistical methods. Finally, we attempted to minimize the Top-k metric subset by removing redundant metrics, and we tested the stability of such a minimum metric subset with one-way ANOVA tests. Results: The study has been conducted on 34 releases of 10 open-source projects available at the PROM-ISE repository. The findings indicate that the predictors built with either Top-k metrics or the minimum metric subset can provide an acceptable result compared with benchmark predictors. The guideline for choosing a suitable simplified metric set in different scenarios is presented in . Conclusion: The experimental results indicate that (1) the choice of training data for defect prediction should depend on the specific requirement of accuracy; (2) the predictor built with a simplified metric set works well and is very useful in case limited resources are supplied; (3) simple classifiers (e.g., Naïve Bayes) also tend to perform well when using a simplified metric set for defect prediction; and (4) in several cases, the minimum metric subset can be identified to facilitate the procedure of general defect prediction with acceptable loss of prediction precision in practice.

An empirical analysis of the effectiveness of software metrics and fault prediction model for identifying faulty classes

Computer Standards & Interfaces

Software fault prediction models are used to predict faulty modules at the very early stage of software development life cycle. Predicting fault proneness using source code metrics is an area that has attracted several researchers' attention. The performance of a model to assess fault proneness depends on the source code metrics which are considered as the input for the model. In this work, we have proposed a framework to validate the source code metrics and identify a suitable set of source code metrics with the aim to reduce irrelevant features and improve the performance of the fault prediction model. Initially, we applied a t-test analysis and univariate logistic regression analysis to each source code metric to evaluate their potential for predicting fault proneness. Next, we performed a correlation analysis and multivariate linear regression stepwise forward selection to find the right set of source code metrics for fault prediction. The obtained set of source code metrics are considered as the input to develop a fault prediction model using a neural network with five different training algorithms and three different ensemble methods. The effectiveness of the developed fault prediction models are evaluated using a proposed cost evaluation framework. We performed experiments on fifty six Open Source Java projects. The experimental results reveal that the model developed by considering the selected set of source code metrics using the suggested source code metrics validation framework as the input achieves better results compared to all other metrics. The experimental results also demonstrate that the fault prediction model is best suitable for projects with faulty classes less than the threshold value depending on fault identification efficiency (low-48.89%, median-39.26%, and high-27.86%).

A Review on Software Defect Prediction Techniques Using Product Metrics

International Journal of Database Theory and Application, 2017

Presently, complexity and volume of software systems are increasing with a rapid rate. In some cases it improves performance and brings efficient outcome, but unfortunately in several situations it leads to elevated cost for testing, meaningless outcome and inferior quality, even there is no trustworthiness of the products. Fault prediction in software plays a vital role in enhancing the software excellence as well as it helps in software testing to decrease the price and time. Conventionally, to describe the difficulty and calculate the duration of the programming, software metrics can be utilized. To forecast the amount of faults in module and utilizing software metrics, an extensive investigation is performed. With the purpose of recognizing the causes which importantly enhances the fault prediction models related to product metrics, this empirical research is made. This paper visits various software metrics and suggested procedures through which software defect prediction is enhanced and also summarizes those techniques.

Implication of Data Mining and Machine Learning in Software Engineering Domain for Software Model, Quality and Defect Prediction

INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING and SYSTEM SOFTWARE, 2020

Software metrics have a direct link with measurement in software engineering. Correct measurement is the prior condition in any engineering fields, and software engineering is not an exception, as the size and complexity of software increases, manual inspe becomes a harder task. Most Software Engineers worry about the quality of software, how to measure and enhance its quality. The overall objective of this study was to asses and analysis’s software metrics used to measure the software prod Developers have attempted to improve software quality by mining and analyzing software data. In any phase of software development life cycle (SDLC), while huge amount of data is produced, some design, security, or software problems may analyzing software data helps to handle these problems and lead to more accurate and timely delivery of software projects. Various data mining and machine learning studies have been conducted to deal with software engineering tasks such as defect prediction, effort estimation, etc. This study shows the open issues and presents related solutions and recommendations in software engineering, applying data mining and machine learning techniques. Software quality is a field of study and practice that describes the desirable attributes of software products. The performance must be perfect without any defects. Software quality metrics are a subset of software metrics that f software defect prediction model helps in early detection of defects and contributes to their efficient removal and producing a quality software system based on several metrics. The main paper is to help developers identify defects based on existing software metrics using data mining techniques and thereby improve the software quality. In this paper, various classification are revisited which are employed for software defect prediction using software metrics in the literature.

Towards Developing and Analysing Metric-Based Software Defect Severity Prediction Model

Cornell University - arXiv, 2022

In a critical software system, the testers have to spend an enormous amount of time and effort to maintain the software due to the continuous occurrence of defects. Among such defects, some severe defects may adversely affect the software. To reduce the time and effort of a tester, many machine learning models have been proposed in the literature, which use the documented defect reports to automatically predict the severity of the defective software modules. In contrast to the traditional approaches, in this work we propose a metric-based software defect severity prediction (SDSP) model that uses a self-training semi-supervised learning approach to classify the severity of the defective software modules. The approach is constructed on a mixture of unlabelled and labelled defect severity data. The self-training works on the basis of a decision tree classifier to assign the pseudo-class labels to the unlabelled instances. The predictions are promising since the self-training successfully assigns the suitable class labels to the unlabelled instances. On the other hand, numerous research studies have covered proposing prediction approaches as well as the methodological aspects of defect severity prediction models, the gap in estimating project attributes from the prediction model remains unresolved. To bridge the gap, we propose five project specific measures such as the Risk-Factor (RF), the Percent of Saved Budget (PSB), the Loss in the Saved Budget (LSB), the Remaining Service Time (RST) and Gratuitous Service Time (GST) to capture project outcomes from the predictions. Similar to the traditional measures, these measures are also calculated from the observed confusion matrix. These measures are used to analyse the impact that the prediction model has on the software project.

ROLE OF DATA MINING CLASSIFICATION TECHNIQUE IN SOFTWARE DEFECT PREDICTION

Software defect prediction is the process of locating defective modules in software. Software quality may be a field of study and apply that describes the fascinating attributes of software package product. The performance should be excellent with none defects. Software quality metrics are a set of software package metrics that target the standard aspects of the product, process, and project. The software package defect prediction model helps in early detection of defects and contributes to their economical removal and manufacturing a top quality software package supported many metrics. The most objective of paper is to assist developers determine defects supported existing software package metrics victimization data mining techniques and thereby improve the software package quality. In this paper, role of various classification techniques in software defect prediction process are analyzed.