Improving model performance for software defect detection and prediction using ensemble method and cross validation techniques (original) (raw)

Software defects and quality assurance are crucial aspects of software development that should be considered during the software development cycle. To ensure high-quality software, it is essential to have a robust quality assurance process in place. System reliability and quality are very key components that must be considered during software development, and this can only be achieved when software undergoes a thorough test process for errors, anomalies, defects, omissions, and bugs. Early software defect prediction and detection play an essential role in ensuring the reliability and quality of software systems, ensuring that software companies discover errors or defects early enough and allocate more resources to defect-prone modules. This study proposes the development of an enhanced classifier model for software defect prediction and detection. The aim is to harness the collective intelligence of selected base classifiers like Support Vector Machine, Logistic regression, Decision Trees, Random Forest, AdaBoost, Gradient Boosting, K-Nearest Neighbor, GaussianNB, and Multi-Layer Perception to improve accuracy, robustness, and generalization in identifying potential defects using a soft voting ensemble technique. The ensemble model leveraged the confidence probability of the soft voting technique and the generalization advantage of cross-validation leading to a more robust and dynamic model. The performance of the model with existing classifiers was evaluated using accuracy, F1 score, Precision, and area under the ROC curve (ROC- AUC) as the evaluation metrics. The results of the experiment revealed that the Proposed Classifier produced an overall Accuracy rate of 93%, and ROC AUC of 98%. The results demonstrate the effectiveness of our enhanced ensemble classifier in software defect detection and prediction. By harnessing the strengths of diverse base classifiers, our approach provides a robust and adaptive solution to the challenges of early detection and mitigating defects in software systems. This research contributes to the advancement of reliable software development practices and lays the foundation for future enhancements in ensemble-based defect detection methodologies. Keywords: Base Classifier; Cross-Validation; Ensemble; Machine learning; Software Defect; Soft Voting