An Initial Study on the Bug Report Duplication Problem (original) (raw)
Related papers
One Step More to Understand the Bug Report Duplication Problem
Software …, 2010
According to recent work, duplicate bug reports impact negatively on software maintenance and evolution productivity due to, among other factors, the increased time spent on report analysis and validation. Therefore, a considerable amount of time is lost mainly with duplicate bug report analysis. In this sense, this work presents am exploratory study using data from bug trackers from private and open source projects, in order to understand the possible factors (i.e. software life-time, size, amount of bug reports, etc.) that cause bug report duplication and its impact on software development. This work also discusses bug report characteristics that could help identifying duplicates.
IJERT-Duplicate bug report rate in oss projects a comparative analysis
International Journal of Engineering Research and Technology (IJERT), 2013
https://www.ijert.org/duplicate-bug-report-rate-in-oss-projects-a-comparative-analysis https://www.ijert.org/research/duplicate-bug-report-rate-in-oss-projects-a-comparative-analysis-IJERTV2IS100895.pdf The use of bug tracking system in Open Source Software (OSS) to organize maintenance activity of software is wide spread. Bug tracking system is open bug repository that is maintained by open source software organizations to track their bugs so that bug reports from all over the world can be gathered.To improve the reliability of software system developers often allow user to submit a bug by writing bug report in bug tracking system over the internet. The system of tracking bugs in open bug repository is totally distributed and uncontrolled. Different reporters may submit same bug report again and again for the sameproblem. The same report which is submitted by several reporters is referred as duplicate bug report. Identification of these duplicate reports is time consuming and intensifies the already high cost of software maintenance. Duplicate bug put extra overhead on software organizations, as they put negative effect on maintenance of software. So utility of these systems is hindered by excessive number of dup
Evaluating a Tool for Bug-report Analysis and Search
lbd.dcc.ufmg.br
Bug report tracking systems have been used to facilitate the maintenance and evolution of software. However, duplicate entries of bug reports in such systems can considerably impact productivity within software project. This reduction in productivity occurs because duplicate entries demand more time for search and analysis of bug reports. In this context, this paper presents the main problems caused by bug report duplication problem. In addition, a tool for bug reports search and analysis (BAST) is proposed to deal with the duplication avoidance, as well as, it also presents a case study to evaluate the tool. For the evaluation, we compared BAST against a baseline tool in a private company for software testing. The results showed that BAST worked better than the other one, both to reduce the time of analysis, as well as, to reduce the number of duplicates submitted.
ANALYZING THE IMPACT OF SIMILARITY MEASURES IN DUPLICATE BUG REPORT DETECTION
IAEME PUBLICATION, 2020
Duplicate Bug Report Detection is one of the very important tasks which is done during the assignment of bug reports to the concerned developer. As the Bug Reports of Open-Source projects are usually submitted by persons all over the geographical locations, the submission process is uncoordinated. Moreover this un coordinated submission leads to duplicate bug reports also. Bug Report Triager has to usually go through the tedious process of manually detecting the duplicate bug reports. Automatic Duplicate Bug Report Detection assists in easing the work of detection of duplicate bug reports. Survey shows that calculation of bug reports on the basis of similarity measures is the best way to perform this task of duplicate bug report detection task as the unbalanced data leads to imbalancing problem for machine learning approach. In this paper, we analyze how the different similarity measures impact the task of duplicate bug reports. For our analysis purpose, we have used Levenshtein, Jaccard, Cosine, BM25 , LSI and K-Means similarity measures. By including these similarity measures for the analysis purpose, Natural Language Processing, Machine Learning and Information Retrieval techniques are covered.
IJERT-A Survey on Automated Duplicate Detection in a Bug Repository
International Journal of Engineering Research and Technology (IJERT), 2014
https://www.ijert.org/a-survey-on-automated-duplicate-detection-in-a-bug-repository https://www.ijert.org/research/a-survey-on-automated-duplicate-detection-in-a-bug-repository-IJERTV3IS041769.pdf in a bug repository whenever a new bug is reported it is important to analyze this new bug report in order to detect it as duplicate or non duplicate. For this purpose plenty of work has been done in this area. In this paper we go through the previously proposed techniques and analyze the contribution of each work in making duplicate detection in bug repository such an important area of research.
DURFEX: A Feature Extraction Technique for Efficient Detection of Duplicate Bug Reports
The detection of duplicate bug reports can help reduce the processing time of handling field crashes. This is especially important for software companies with a large client base where multiple customers can submit bug reports, caused by the same faults. There exist several techniques for the detection of duplicate bug reports; many of them rely on some sort of classification techniques applied to information extracted from stack traces. They classify each report using functions invoked in the stack trace associated with the bug report. The problem is that typical bug repositories may have stack traces that contain tens of thousands of functions, which causes the curse of dimensionality problem. In this paper, we propose a feature extraction technique that reduces the feature size and yet retains the information that is most critical for the classification. The proposed feature extraction approach starts by abstracting stack traces of function calls into sequences of package names, by replacing each function with the package in which it is defined. We then segment these traces into multiple N-grams of variable length and map them to fixed-size sparse feature vectors, which are used to measure the distance between the stack trace of incoming bug report with a historical set of bug reports stack traces. The linear combination of stack trace similarity and non-textual fields such as component and severity are then used to measure the distance of a bug report with a historical set of bug reports. We show the effectiveness of our approach by applying it to the Eclipse bug repository that contains tens of thousands of bug reports. Our approach outperforms the approach that uses distinct function names, while significantly reducing the processing time.
An HMM-based approach for automatic detection and classification of duplicate bug reports
Information and Software Technology, 2019
Context: Software projects rely on their issue tracking systems to guide maintenance activities of software developers. Bug reports submitted to the issue tracking systems carry crucial information about the nature of the crash (such as texts from users or developers and execution information about the running functions before the occurrence of a crash). Typically, big software projects receive thousands of reports every day. Objective: The aim is to reduce the time and effort required to fix bugs while improving software quality overall. Previous studies have shown that a large amount of bug reports are duplicates of previously reported ones. For example, as many as 30% of all reports in for Firefox are duplicates. Method: While there exist a wide variety of approaches to automatically detect duplicate bug reports by natural language processing, only a few approaches have considered execution information (the so-called stack traces) inside bug reports. In this paper, we propose a novel approach that automatically detects duplicate bug reports using stack traces and Hidden Markov Models. Results: When applying our approach to Firefox and GNOME datasets, we show that, for Firefox, the average recall for Rank k =1 is 59%, for Rank k=2 is 75.55%. We start reaching the 90% recall from k=10. The Mean Average Precision (MAP) value is up to 76.5%. For GNOME, The recall at k=1 is around 63%, while this value increases by about 10% for k=2. The recall increases to 97% for k=11. A MAP value of up to 73% is achieved. Conclusion: We show that HMM and stack traces are a powerful combination for detecting and classifying duplicate bug reports in large bug repositories.
BAST - A Tool for Bug Report Analysis and Search
2009
Bug tracker systems have been used to facilitate maintenance and evolution of software. However, duplicate entries of bug reports in such systems impact on the development productivity. It happens mainly because developers must carefully analyze incoming bug reports to check their validity, which some can take more than 20 minutes to be analyzed. In this sense, this work presents a tool for bug reports search and analysis, in order to improve such tasks.
Effective Bug Tracking Systems: Theories and Implementation
IOSR Journal of Computer Engineering, 2012
Bug tracking is an essential discipline in the domain of software engineering. It has far reaching effects on the system when effectively used. The information provided in terms of bugs and solutions in the bug reports can help software engineers to act on them quickly and ensure that they are either rectified or eliminated from the system. The bulk of information provided in the bug reports may cause problem to developers in ascertaining poorly designed information. Therefore the bug tracking systems are to be improved and follow certain standards. To overcome the problem, we propose four fundamental directions to enhancing effectiveness of bug tracking systems. To demonstrate the efficiency of the proposed directions, we develop a prototype application that tracks bugs effectively by capturing essential information from users and help resolve bugs quickly. I.