Machine Learning or Information Retrieval Techniques for Bug Triaging: Which is better? (original) (raw)

Using Machine Learning Methods for Automatic Bug Assignment to Developers

2020

Background and Objectives: It is generally accepted that the highest cost in software development is associated with the software maintenance phase. In corrective maintenance, the main task is correcting the bugs found by the users. These bugs are submitted by the users to a Bug Tracking System (BTS). The bugs are evaluated by the bug triager and assigned to the developers to correct them. To find a related developer to correct the bug, recent developers' activities and previous bug fixes must be examined. This paper presents an automated method to assign bugs to developers by identifying similarity between new bugs and previously reported bug reports. Methods: For automatic bug assignment, four clustering techniques (i.e. Expectation-Maximization (EM), Farthest First, Hierarchical Clustering, and Simple Kmeans) are used where a tag is created for each cluster that indicates an associated developer for bug correction. To evaluate the quality of the proposed methods, the clusters generated by the methods are compared with the labels suggested by an expert triager. Results: To evaluate the performance of the proposed method, we use realworld data of a large scale web-based system which is stored in the BTS of a software company. To select the appropriate algorithm for the clustering, the outputs of each clustering algorithm are compared to the labels suggested by the expert triager. The algorithm with closer output to the expert opinion is selected as the best algorithm. The results showed that EM and FarthestFirst clustering algorithms with 3% similarity error have the most similarity with the expert opinion. Conclusion: the results obtained by the algorithms show that we can successfully apply them for bug assignment in real-world software development environments.

TRAM: An Approach for Assigning Bug Reports using their Metadata

Bug triage is an essential phase in the bug fixing process. The aim of bug triage is to assign an experienced developer to a new coming bug report. Existing bug triage approaches are mainly based on machine learning techniques. These approaches suffer from low prediction accuracy. In this paper, we propose TRAM (TRiaging Approach using bug reports Metadata). The goal is to improve the prediction accuracy of bug triage by utilizing the most discriminating terms of bug reports, the components in which the bugs belong to, and the reporter who filed the bug.We perform experimental evaluation on open-source projects namely Freedesktop, NetBeans, Eclipse, and Firefox. The results show that TRAM outperforms existing machine learning based approaches in terms of classification accuracy. TRAM improves the F-score by approximately 34%, 40%, 20%, and 21% for Freedesktop, NetBeans, Eclipse, and Firfox respectively.

Literature survey on automatic bug triaging using machine learning techniques

INNOVATIONS IN COMPUTATIONAL AND COMPUTER TECHNIQUES: ICACCT-2021

Amongst all the phases of software development life cycle software testing phase is the most crucial phase. in this phase the errors or bugs are reported by the tester. Any kind of unexpected behavior in the software is referred to bug. the same can be referred as error or flaw. As soon as the bug is reported by the software tester the next step is to fix the bug or we can say that assigning the bug to appropriate developer which is known as bug triaging. the companies spend large amount of costs in bug triaging process. The process of assigning the bug to appropriate developer is based on various factors like severity, priority and risk associated with the bug. The key idea behind the bug triaging process is to search the suitable developer who can fix the bug in short span of time. in this research paper we will address the factors which impact the performance of bug triaging process. in the later section the comparative analysis of various automatic bug triaging processes has been done.

A time-based approach to automatic bug report assignment

Bug assignment is one of the important activities in bug triaging that aims to assign bugs to the appropriate developers for fixing. Many recommended automatic bug assignment approaches are based on text analysis methods such as machine learning and information retrieval methods. Most of these approaches use term-weighting techniques, such as term frequency-inverse document frequency (tf-idf), to determine the value of terms. However, the existing term-weighting techniques only deal with frequency of terms without considering the metadata associated with the terms that exist in software repositories. This paper aims to improve automatic bug assignment by using time-metadata in tf-idf (Time-tf-idf). In the Time-tf-idf technique, the recency of using the term by the developer is considered in determining the values of the developer expertise. An evaluation of the recommended automatic bug assignment approach that uses Time-tf-idf, called ABA-Time-tf-idf, was conducted on three open-source projects. The evaluation shows accuracy and mean reciprocal rank (MRR) improvements of up to 11.8% and 8.94%, respectively, in comparison to the use of tf-idf. Moreover, the ABA-Time-tf-idf approach outperforms the accuracy and MRR of commonly used approaches in automatic bug assignment by up to 45.52% and 55.54%, respectively. Consequently, consideration of time-metadata in term weighting reasonably leads to improvements in automatic bug assignment.

An Extended Survey Concerning the Significance of Artificial Intelligence and Machine Learning Techniques for Bug Triage and Management

IEEE Access

Bug reports are generated in large numbers during the software development processes in the software industry. The manual processing of these issues is usually time consuming and prone to errors, consequently delaying the entire software development process. Thus, a properly designed bug triage and management process implies that essential operations, such as duplicate detection, bug assignments to proper developers, and determination of the importance level, are sustained by efficient algorithmic models and implementation approaches. Designing and implementing a proper bug triage and management process becomes an essential scientific research topic, as it may significantly optimize the software development and business process in the information technology industry. Consequently, this paper thoroughly surveys the most significant related scientific contributions analytically and constructively, distinguishing it from similar survey papers. The paper proposes optimal algorithmic and software solutions for particular realworld use cases that are analyzed. It concludes by presenting the most important open research questions and challenges. Additionally, the paper provides a valuable scientific literature survey for any researcher or practitioner in software bug triage and management systems based on artificial intelligence and machine learning techniques.

DESIGN OF AN EFFECTIVE MECHANISM FOR AUTOMATED BUG TRIAGE SYSTEM

— Nowadays IT companies are spending more than 45 percent of their cost in fixing software bugs. Traditionally these bugs are fixed by manual assignment to a particular developer; this approach causes too much dependency. The new and alternative approach is the Bug Triage System, which fixes the bug and assigns the reported bug to a developer automatically so that it decreases the time and cost in manual work. Combine instance selection with feature selection to simultaneously reduce data scale on the bug dimension and the word dimension. We propose to apply machine learning technique in bug triage to predict which developer should be assigned on the bug, based on its description by applying text categorization.

Machine Learning Based Bug Triaging for Severity Identification

2020

The bug triage system is responsible for executing the classification of bugs by making use of unique developers. Different parameters like severity, priority and risk factor are utilized. The list of defects is taken and then assigned to individual developers based on priority of the bugs assigned. The defects are then assigned based on order of story to be completed and then assigned to different scrum teams if agile method is followed or to other miscellaneous teams if iterative development model is followed. This process usually takes time and will cost more to the company. The second problem is that there is either no learning or manual learning. It takes into consideration the developer’s likeness or the kind of bugs previously fixed by the developers. The defect triaging challenges can be significantly reduced by making use of machine learning algorithms which can take historical bugs solved or assigned as the training set for the algorithm, thus intuitively forecast the severity of a defect. Then, class label assignment is done using a machine learning algorithm, which is a combination of both supervised and unsupervised learning methods. This will reduce the process of manual efforts and can achieve partial automation of bug assignments. This project aims to use an efficient Machine learning technique SVM with K -Means Clustering to perform Bug Triaging, which will provide the ability to execute the idea of enhancing the current system with the use of Machine Learning, thus minimizing and simplifying human work, thereby increasing the efficiency and productivity of the team. For the classification of test data, the SVM algorithm is executed. The SVM model is assessed by executing the conversion of defects into tokens, obtaining the count of the tokens, based on which the inverse document word count is found for the bug words and the grouping is done using k means. This method is then compared with conventional Naive Bayes for more accuracy and lesser time consumption. The model evaluation has been done based on the time taken for categorization and the extent of accuracy attained. The datasets of a premier airline, flydubai were used for this evaluation. The comparison is also done with respect to different group sizes of the same data set, after studying a graph model of data size versus accuracy as well as data size verses logarithmic loss. From the implementation results, it is proved that KMEANS-SVM has better accuracy, loss measure, time and memory as compared to Naive Bayes and SVM method. KMEANS-SVM has the high accuracy of about 99.9912 followed by SVM which is around 98.1116 and Naive Bayes is 96.4420. Also, KMEANS-SVM has the low loss measure of about 0.0087 followed by SVM which is around 1.888 and Naive Bayes loss measure is around 3.557.

An Efficient Model for Effective Bug Triage

2016

Most of the software companies needs to deal with large number of software bugs every day. Software bugs are unavoidable and fixing software bugs is an expensive task. The goal of effective bug triaging software is to assign potentially experienced developers to new-coming bug reports. To reduce time and cost of bug triaging, an automatic approach is proposed in this paper that predicts a developer with relevant experience to solve or fix the new coming bug report. In this paper, the five term selection methods on the accuracy of bug assignment are used. In addition, the load between developers based on their experience is re-balanced. The proposed system is built with intention to suggest or recommend the bug and not to automatically assign it. This allows a window to handle real time crisis that come up during project development lifecycle. IndexTerms— Mining software repositories, application of data pre-processing, data management in bug repositories, bug data reduction, feature...

Using Categorical Features in Mining Bug Tracking Systems to Assign Bug Reports

Most bug assignment approaches utilize text classification and information retrieval techniques. These approaches use the textual contents of bug reports to build recommendation models. The textual contents of bug reports are usually of high dimension and noisy source of information. These approaches suffer from low accuracy and high computational needs. In this paper, we investigate whether using categorical fields of bug reports, such as component to which the bug belongs, are appropriate to represent bug reports instead of textual description. We build a classification model by utilizing the categorical features, as a representation, for the bug report. The experimental evaluation is conducted using three projects namely NetBeans, Freedesktop, and Firefox. We compared this approach with two machine learning based bug assignment approaches. The evaluation shows that using the textual contents of bug reports is important. In addition, it shows that the categorical features can impr...

Bug Prioritization to Facilitate Bug Report Triage

Journal of Computer Science and Technology, 2012

The large number of new bug reports received in bug repositories of software systems makes their management a challenging task. Handling these reports manually is time consuming, and often results in delaying the resolution of important bugs. To address this issue, a recommender may be developed which automatically prioritizes the new bug reports. In this paper, we propose and evaluate a classification based approach to build such a recommender. We use the Naïve Bayes and Support Vector Machine (SVM) classifiers, and present a comparison to evaluate which classifier performs better in terms of accuracy. Since a bug report contains both categorical and text features, another evaluation we perform is to determine the combination of features that better determines the priority of a bug. To evaluate the bug priority recommender, we use precision and recall measures and also propose two new measures, Nearest False Negatives (NFN) and Nearest False Positives (NFP), which provide insight into the results produced by precision and recall. Our findings are that the results of SVM are better than the Naïve Bayes algorithm for text features, whereas for categorical features, Naïve Bayes performance is better than SVM. The highest accuracy is achieved with SVM when categorical and text features are combined for training.