SOFTWARE FAULT PREDICTION BASED ON RANDOM FOREST ALGORITHM (original) (raw)
Related papers
Robust Prediction of Fault-Proneness by Random Forests
2004
Accurate prediction of fault prone modules (a module is equivalent to a C function or a C+ + method) in software development process enables effective detection and identification of defects. Such prediction models are especially beneficial for large-scale systems, where verification experts need to focus their attention and resources to problem areas in the system under development. This paper presents a novel methodology for predicting fault prone modules, based on random forests. Random forests are an extension of decision tree learning. Instead of generating one decision tree, this methodology generates hundreds or even thousands of trees using subsets of the training data. Classification decision is obtained by voting. We applied random forests in five case studies based on NASA data sets. The prediction accuracy of the proposed methodology is generally higher than that achieved by logistic regression, discriminant analysis and the algorithms in two machine learning software packages, WEKA [I. H. Witten et al. (1999)] and See5. The difference in the performance of the proposed methodology over other methods is statistically significant. Further, the classification accuracy of random forests is more significant over other methods in larger data sets.
Usage Patterns and Implementation of Random Forest Methods for Software Risk and Bugs Predictions
International Journal of Innovative Technology and Exploring Engineering, 2019
The software bugs predictions whereby the datasets of different types of bugs are evaluated for further predictions. In this research manuscript, the pragmatic evaluation of random forest approach is done and compared with results with traditional artificial neural networks (ANN) so that the results can be compared. From the outcome, the extracts from random forest are better on the accuracy level with the test datasets used in a specific format. The process of Random Forest (RF) Approach is adopted in this work that gives the effectual outcomes in most of the cases as compared to ANN and thereby the usage patterns of RF are performance aware. The paradigm of RF is used widely for the engineering optimization to solve the complex problems and generation of the dynamic trees. The outcomes and results obtained and presented in this work is giving the variations in favor random forest based optimization for the software risk management and predictive mining. The need of the proposed wo...
Evaluation of Machine Learning Classification Techniques in Predicting Software Defects
TMLAI, 2020
The advances in technology has brought about a significant rise in the number of software being developed and deployed on daily basis. This has brought about more dependencies on software and any defect in the software can lead to calamitous issues due to the type of data stored on the software or the type and importance of the functions the software perform. It is necessary to make sure that the all the defects are properly identified before deployment for use. The purpose of this study is to evaluate classifiers on software defects dataset and recommend appropriate classifier for defective software prediction. This will save the developer the stress and time of searching for the defects all through the program code, which will in turn lead to the software to be free from defects that can cause problems in the future of the use of the software. In this research, six categories of machine learning algorithms (two from each category) were tested in Waikato Environment for Knowledge Analysis (WEKA) which are; Bayes (Naives Bayes and Bayes Net), Functions (Multilayer Perceptron (MLP) and Sequential Minimal Optimization (SMO)), Lazy (IBK and KStar), Meta (Random Committee and Bagging), Rules (Decision Table and JRip) and Trees (J48 and Random Forest). The PROMISE dataset was used and the performance metric recorded were; accuracy, false positive rate, precision, recall, f-measure, Receiver Operating Curve (ROC), Kappa Statistics and Root Mean Square Error (RMSE). It was observed that Random Forest performs better under the 10 folds cross validation than the algorithms tested having an Accuracy of 0.818, a Recall of 0.818, a F-measure of 0.787, a ROC of 0.755 and a RMSE of 0.3669.
International Journal on Recent and Innovation Trends in Computing and Communication
The software systems of modern computers are extremely complex and versatile. Therefore, it is essential to regularly detect and correct software design faults. In order to devote resources effectively towards the creation of trustworthy software, software companies are increasingly engaging in the practise of predicting fault-prone modules in advance of testing. These software fault prediction methods rely on the thoroughness with which prior software versions' fault as well as related code has been retrievedTime, energy, and money are all saved as a result. Increases the company's initial success and bottom line greatly by satisfying its clientele. Numerous academics have poured into this area throughout the years in an effort to raise the bar for all software. Nowadays, The most often used approaches in this field are those based on machine learning (ML). The field of ML seeks to perfect software capable of evolving as well as adapting in response to fresh data. This pape...
Fault Prediction Using Statistical and Machine Learning Methods for Improving Software Quality
—An understanding of quality attributes is relevant for the software organization to deliver high software reliability. An empirical assessment of metrics to predict the quality attributes is essential in order to gain insight about the quality of software in the early phases of software development and to ensure corrective actions. In this paper, we predict a model to estimate fault proneness using Object Oriented CK metrics and QMOOD metrics. We apply one statistical method and six machine learning methods to predict the models. The proposed models are validated using dataset collected from Open Source software. The results are analyzed using Area Under the Curve (AUC) obtained from Receiver Operating Characteristics (ROC) analysis. The results show that the model predicted using the random forest and bagging methods outperformed all the other models. Hence, based on these results it is reasonable to claim that quality models have a significant relevance with Object Oriented metrics and that machine learning methods have a comparable performance with statistical methods
World Academy of Research in Science and Engineering, 2019
In software development, the software fault is dealing an important task. The faults presence is not reduce the software quality and increase the cost development. The software system has the prediction in the cross project for the software system, the large number of models is presented. The task of fault prediction is difficult because most of them provide the information as inadequate. The proposed method of this paper is hybrid model for predicting the cross project faults using the random forest (RF) technique and multi-objective Ant Lion optimization (MO-ALO) approach in the given software system. By using the eight software projects, the data in PROMISE data repository is used for the experimental results have been done. The performance evaluation of the method is evaluated with the existing techniques which are Support Vector Machine (SVM), RF and the K-nearest Neighbor (KNN). The results for the number of cross project faults prediction shown in the RF and MOALO based model.