A Novel Multiple Ensemble Learning Models Based on Different Datasets for Software Defect Prediction (original) (raw)
Related papers
Early Prediction of Software Defect using Ensemble Learning: A Comparative Study
Recently, early prediction of software defects using the machine learning techniques has attracted more attention of researchers due to its importance in producing a successful software. On the other side, it reduces the cost of software development and facilitates procedures to identify the reasons for determining the percentage of defect-prone software in future. There is no conclusive evidence for specific types of machine learning that will be more efficient and accurate to predict of software defects. However, some of the previous related work proposes the ensemble learning techniques as a more accurate alternative. This paper introduces the resample technique with three types of ensemble learners; Boosting, Bagging and Rotation Forest, using eight of base learner tested on seven types of benchmark datasets provided in the PROMISE repository. Results indicate that accuracy has been improved using ensemble techniques more than single leaners especially in conjunction with Rotation Forest with the resample technique in most of the algorithms used in the experimental results.
An Ensemble Learning Approach for Software Defect Prediction in Developing Quality Software Product
Advances in Computing and Data Sciences, 2021
Software Defect Prediction (SDP) is a major research field in the software development life cycle. The accurate SDP would assist software developers and engineers in developing a reliable software product. Several machine learning techniques for SDP have been reported in the literature. Most of these studies suffered in terms of prediction accuracy and other performance metrics. Many of these studies focus only on accuracy and this is not enough in measuring the performance of SDP. In this research, we propose a seven-ensemble machine learning model for SDP. The Cat boost, Light Gradient Boosting Machine (LGBM), Extreme Gradient Boosting (XgBoost), boosted cat boost, bagged logistic regression, boosted LGBM, and boosted XgBoost were used for the experimental analysis. We also used the separate individual base model of logistic regression for the analysis on six datasets. This paper extends the performance metrics from only the accuracy, the Area Under Curve (AUC), precision, recall, F-measure, and Matthew Correlation Coefficient (MCC) were used as performance metrics. The results obtained showed that the proposed ensemble Cat boost model gave an outstanding performance for all the three defects datasets as a result of being able to decrease overfitting and reduce the training time.
Software defect prediction using ensemble learning on selected features
Information and Software Technology, 2015
Context: Several issues hinder software defect data including redundancy, correlation, feature irrelevance and missing samples. It is also hard to ensure balanced distribution between data pertaining to defective and non-defective software. In most experimental cases, data related to the latter software class is dominantly present in the dataset. Objective: The objectives of this paper are to demonstrate the positive effects of combining feature selection and ensemble learning on the performance of defect classification. Along with efficient feature selection, a new two-variant (with and without feature selection) ensemble learning algorithm is proposed to provide robustness to both data imbalance and feature redundancy. Method: We carefully combine selected ensemble learning models with efficient feature selection to address these issues and mitigate their effects on the defect classification performance. Results: Forward selection showed that only few features contribute to high area under the receiveroperating curve (AUC). On the tested datasets, greedy forward selection (GFS) method outperformed other feature selection techniques such as Pearson's correlation. This suggests that features are highly unstable. However, ensemble learners like random forests and the proposed algorithm, average probability ensemble (APE), are not as affected by poor features as in the case of weighted support vector machines (W-SVMs). Moreover, the APE model combined with greedy forward selection (enhanced APE) achieved AUC values of approximately 1.0 for the NASA datasets: PC2, PC4, and MC1. Conclusion: This paper shows that features of a software dataset must be carefully selected for accurate classification of defective components. Furthermore, tackling the software data issues, mentioned above, with the proposed combined learning model resulted in remarkable classification performance paving the way for successful quality control.
International Journal of Science and Research Archive, 2024
Software defects and quality assurance are crucial aspects of software development that should be considered during the software development cycle. To ensure high-quality software, it is essential to have a robust quality assurance process in place. System reliability and quality are very key components that must be considered during software development, and this can only be achieved when software undergoes a thorough test process for errors, anomalies, defects, omissions, and bugs. Early software defect prediction and detection play an essential role in ensuring the reliability and quality of software systems, ensuring that software companies discover errors or defects early enough and allocate more resources to defect-prone modules. This study proposes the development of an enhanced classifier model for software defect prediction and detection. The aim is to harness the collective intelligence of selected base classifiers like Support Vector Machine, Logistic regression, Decision Trees, Random Forest, AdaBoost, Gradient Boosting, K-Nearest Neighbor, GaussianNB, and Multi-Layer Perception to improve accuracy, robustness, and generalization in identifying potential defects using a soft voting ensemble technique. The ensemble model leveraged the confidence probability of the soft voting technique and the generalization advantage of cross-validation leading to a more robust and dynamic model. The performance of the model with existing classifiers was evaluated using accuracy, F1 score, Precision, and area under the ROC curve (ROC- AUC) as the evaluation metrics. The results of the experiment revealed that the Proposed Classifier produced an overall Accuracy rate of 93%, and ROC AUC of 98%. The results demonstrate the effectiveness of our enhanced ensemble classifier in software defect detection and prediction. By harnessing the strengths of diverse base classifiers, our approach provides a robust and adaptive solution to the challenges of early detection and mitigating defects in software systems. This research contributes to the advancement of reliable software development practices and lays the foundation for future enhancements in ensemble-based defect detection methodologies. Keywords: Base Classifier; Cross-Validation; Ensemble; Machine learning; Software Defect; Soft Voting
IAEME PUBLICATION, 2020
Software fault prediction is the salient factor for any software project because of the development of IT, the size of the software is increasing day by day and it becomes very difficult to test each module manually. So we need an automated approach that focus on faulty modules and enable the testing team to test the faulty modules and correct it on time. Pool of base predictors are there that are already being developed and tested. Combining the base predictors using ensemble strategy either homogeneous or heterogeneous will surely increase the prediction power of the model. Diversity of classifiers will be the primary requirement. The more diverse the classifiers are, the more efficient the model will be. So, this study demonstrates the performance of base predictors and ensemble predictors on software fault dataset and also compares them to determine which is better than the other. Various dataset from promise repository is used for comparing and implementing the model. Naïve Bayes, Logistic regression and Decision tree is used as a base predictor. Bagging, Boosting, voting, Random Forest is used as an ensemble. The results of the study prove that the prediction power of ensemble techniques is always better than the single technique
Software Defect Prediction Using the Machine Learning Methods
Problems of Information Technology
Reliability of software systems is one of the main indicators of quality. Defects occurring when developing software systems have a direct effect on reliability. Precise prediction of defects in software systems helps software engineers to ensure the reliability of software systems and to properly allocate resources for the trial process. The development of an ensemble method by combining several classification methods occupies one of the main places in research conducted in the field of error prediction in software modules. This paper proposes a method based on the application of ensemble training for defect detection. Here, a database obtained from PROMISE and GITHUB software engineering registry is used to detect defects. Experiments are conducted using Weka software. The prediction efficiency is evaluated based on F-measure and ROC-area. As a result of experiments, the defect detection accuracy of the proposed method is proven to be higher than that of individual machine learnin...
Software Defect Prediction Using Ensemble Learning: An ANP Based Evaluation Method
FUOYE Journal of Engineering and Technology
Software defect prediction (SDP) is the process of predicting defects in software modules, it identifies the modules that are defective and require extensive testing. Classification algorithms that help to predict software defects play a major role in software engineering process. Some studies have depicted that the use of ensembles is often more accurate than using single classifiers. However, variations exist from studies, which posited that the efficiency of learning algorithms might vary using different performance measures. This is because most studies on SDP consider the accuracy of the model or classifier above other performance metrics. This paper evaluated the performance of single classifiers (SMO, MLP, kNN and Decision Tree) and ensembles (Bagging, Boosting, Stacking and Voting) in SDP considering major performance metrics using Analytic Network Process (ANP) multi-criteria decision method. The experiment was based on 11 performance metrics over 11 software defect dataset...
Software Defects Prediction At Method Level Using Ensemble Learning Techniques
International Journal of Intelligent Computing and Information Sciences
Creating error-free software artifacts is essential to increase software quality and potential re-usability. However, testing software artifacts to find defects and fix them is time consuming and costly, thus predicting the most error-prone software components can optimize the testing process by focusing testing resources on those components to save time and money. Much software defect prediction research has focused on higher granularity, e.g., file and package levels, and fewer have focused on the method level. In this paper, software defect prediction will be performed on highly imbalanced method-level datasets extracted from 23 open source Java projects. Eight ensemble learning algorithms will be applied to the datasets: Ada-Boost, Bagging, Gradient boost, Random Forest, Random Under sampling Boost, Easy Ensemble, Balanced Bagging and Balanced Random Forest. The results showed that the Balanced Random Forest classifier achieved the best results regarding recall and roc_auc values.
Defect Prediction : Effect of Feature Selection and Ensemble Methods
2018
Software defect prediction is the process of locating defective modules in software. It facilitates testing efficiency and consequently software quality. It enables a timely identification of fault-prone modules. The use of single classifiers and ensembles for predicting defects in software has been met with inconsistent results. Previous analysis say ensemble are often more accurate and are less affected by noise in datasets, also achieving lower average error rates than any of the constituent classifiers. However, inconsistencies exist in these various experiments and the performance of learning algorithms may vary using different performance measures and under different circumstances. Therefore, more research is needed to evaluate the performance of ensemble algorithms in software defect prediction. Adding feature selection reduces data sets with fewer features and improves the classifiers and ensemble performance over the datasets. The goal of this paper is to assess the efficie...
Software Defect Prediction Using Ensemble Learning: A Systematic Literature Review
IEEE Access
Recent advances in the domain of software defect prediction (SDP) include the integration of multiple classification techniques to create an ensemble or hybrid approach. This technique was introduced to improve the prediction performance by overcoming the limitations of any single classification technique. This research provides a systematic literature review on the use of the ensemble learning approach for software defect prediction. The review is conducted after critically analyzing research papers published since 2012 in four well-known online libraries: ACM, IEEE, Springer Link, and Science Direct. In this study, five research questions covering the different aspects of research progress on the use of ensemble learning for software defect prediction are addressed. To extract the answers to identified questions, 46 most relevant papers are shortlisted after a thorough systematic research process. This study will provide compact information regarding the latest trends and advances in ensemble learning for software defect prediction and provide a baseline for future innovations and further reviews. Through our study, we discovered that frequently employed ensemble methods by researchers are the random forest, boosting, and bagging. Less frequently employed methods include stacking, voting and Extra Trees. Researchers proposed many promising frameworks, such as EMKCA, SMOTE-Ensemble, MKEL, SDAEsTSE, TLEL, and LRCR, using ensemble learning methods. The AUC, accuracy, F-measure, Recall, Precision, and MCC were mostly utilized to measure the prediction performance of models. WEKA was widely adopted as a platform for machine learning. Many researchers showed through empirical analysis that features selection, and data sampling was necessary pre-processing steps that improve the performance of ensemble classifiers. INDEX TERMS Systematic literature review (SLR), ensemble classifier, hybrid classifier, software defect prediction.