How software repositories can help in resolving a new change request (original) (raw)

PR-SZZ: How pull requests can support the tracing of defects in software repositories

2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)

The SZZ algorithm represents a standard way to identify bug fixing commits as well as inducing counterparts. It forms the basis for data sets used in numerous empirical studies. Since its creation, multiple extensions have been proposed to enhance its performance. For historical reasons, related work relies on commit messages to map bug tickets to possibly related code with no additional data used to trace inducing commits from these fixes. Therefore, we present an updated version of SZZ utilizing pull requests, which are widely adopted today. We evaluate our approach in comparison to existing SZZ variants by conducting experiments and analyzing the usage of pull requests, inner commits, and merge strategies. We base our results on 6 open-source projects with more than 50k commits and 35k pull requests. With respect to bug fixing commits, on average 18% of bug tickets can be additionally mapped to a fixing commit, resulting in an overall F-score of 0.75, an improvement of 40 percentage points. By selecting an inducing commit, we manage to reduce the false-positives and increase precision by on average 16 percentage points in comparison to existing approaches. Index Terms-SZZ, defect data set, bug fixing changes, bug inducing changes, mining software repositories

An empirical study of supplementary patches in open source projects

Empirical Software Engineering, 2016

Developers occasionally make more than one patch to fix a bug. The related patches sometimes are intentionally separated, but unintended omission errors require supplementary patches. Several change recommendation systems have been suggested based on clone analysis, structural dependency, and historical change coupling in order to reduce or prevent incomplete patches. However, very few studies have examined the reason that incomplete patches occur and how real-world omission errors could be reduced. This paper systematically studies a group of bugs that were fixed more than once in open source projects in order to understand the characteristics of incomplete patches. Our study on Eclipse JDT core, Eclipse SWT, Mozilla, and Equinox p2 showed that a significant portion of the resolved bugs require more than one attempt to fix. Compared to single-fix bugs, the multi-fix bugs did not have a lower quality of bug reports, but more attribute changes (i.e., cc'ed developers or title) were made to the multi-fix bugs than to the single-fix bugs. Multi-fix bugs are more likely to have high severity levels than single-fix bugs. Hence, more developers have participated in discussions about multi-fix bugs compared to single-fix bugs. Multi-fix bugs take more time to resolve than single-fix bugs do. Incomplete patches are longer and more scattered, and they are related to more files than regular patches are. Our manual inspection showed that the causes of incomplete patches were diverse, including missed porting updates, incorrect handling of conditional statements, and incomplete

An empirical study of supplementary bug fixes

2012

A recent study finds that errors of omission are harder for programmers to detect than errors of commission. While several change recommendation systems already exist to prevent or reduce omission errors during software development, there have been very few studies on why errors of omission occur in practice and how such errors could be prevented. In order to understand the characteristics of omission errors, this paper investigates a group of bugs that were fixed more than once in open source projects-those bugs whose initial patches were later considered incomplete and to which programmers applied supplementary patches.

Wayback Machine: A tool to capture the evolutionary behaviour of the bug reports and their triage process in open-source software systems

Journal of Systems and Software, 2022

The issue tracking system (ITS) is a rich data source for data-driven decisionmaking. Different characteristics of bugs, such as severity, priority, and time to fix, provide a clear picture of an ITS. Nevertheless, such information may be misleading. For example, the exact time and the effort spent on a bug might be significantly different from the actual reporting time and the fixing time. Similarly, these values may be subjective, e.g., severity and priority values are assigned based on the intuition of a user or a developer rather than a structured and well-defined procedure. Hence, we explore the evolution of the bug dependency graph together with priority and severity levels to explore the actual triage process. Inspired by the idea of the "Wayback Machine" for the World Wide Web, we aim to reconstruct the historical decisions made in the ITS. Therefore, any bug prioritization or bug triage algorithms/scenarios can be applied in the same environment using our proposed ITS Wayback Machine. More importantly, we track the evolutionary metrics in the ITS when a custom triage/prioritization strategy is employed. We test the efficiency of the proposed algorithm using data extracted from three open-source projects. Our empirical study sheds light on the overlooked evolutionary metrics-e.g., overdue bugs and developers' loads-which are facilitated via our proposed past-event re-generator.

Automated classification of change messages in open source projects

ACM Symposium on Applied Computing, 2008

Source control systems permit developers to attach a free form message to every committed,change. The content of these change messages,can support software maintenance activities. We present an automated,approach to classify a change message as either a bug fix, a feature introduction, or a general maintenance,change. Researchers can study the evolution of project using our classification. For ex- ample, researchers

Software change management

Computer, 1996

oftware projects change rapidly, which makes efficient change management a major challenge for S the software industry. This challenge was poorly met for years because of rudimentary tools. Source code changes, for example, were managed with stand-alone, file-based version control systems. Other changes-in text specification and planning documents, cost estimates, test libraries, graphics and illustrations, and bug inventorieswere managed with limited tools that rarely communicated across domains. It has been recognized since at least 1990 that for many projects, source code isn't the only deliverable that changes. Frequently, the number ofwords created to deal with a software project significantly exceed the size of the source code, and the words change more rapidly. Bug reports, for example, undergo constant change during software development and maintenance as bugs are reported, tested, k e d , and documented. Modern change management, or configuration control, tools must encompass changes affecting every kind of software deliverable and artifact: requirements, project plans, project cost estimates, contracts, design, source code, user documents, illustrations and graphics, test materials, and bug reports. Ideally, these tools would use hypertext to handle cross-references among deliverables so that when something changes, corresponding material is modified appropriately.

Assigning change requests to software developers

Journal of Software: Evolution and Process, 2011

The paper presents an approach to recommend a ranked list of expert developers to assist in the implementation of software change requests (e.g., bug reports and feature requests). An Information Retrieval (IR)-based concept location technique is first used to locate source code entities, e.g., files and classes, relevant to a given textual description of a change request. The previous commits from version control repositories of these entities are then mined for expert developers. The role of the IR method in selectively reducing the mining space is different from previous approaches that textually index past change requests and/or commits. The approach is evaluated on change requests from three open-source systems: ArgoUML, Eclipse, and KOffice, across a range of accuracy criteria. The results show that the overall accuracies of the correctly recommended developers are between 47 and 96% for bug reports, and between 43 and 60% for feature requests. Moreover, comparison results with two other recommendation alternatives show that the presented approach outperforms them with a substantial margin. Project leads or developers can use this approach in maintenance tasks immediately after the receipt of a change request in a free-form text. a bug, or a feature. Clearly, this activity is reactive and may not necessarily yield an effective or efficient answer. An active developer of ArgoUML, where this activity is manual, stated that they would welcome any tool that would lead to more enjoyable and efficient job experience, and is not perceived as a hindrance. In open-source software development, where much relies on volunteers, it could serve as a catalyst if there was a tool that automatically mapped change requests to appropriate developers. That is, developers do not have to wade through the numerous change requests to seek for what they can contribute to; they are presented a 'filtered' set of change requests that suits their palates instead. Both help seekers and sustained software evolution in such a situation would greatly benefit from a proactive approach that automatically recommends the appropriate developers based solely on information available in textual change requests. Change requests are typically specified in a free-form textual description using natural language (e.g., a bug reported to the Bugzilla system of a software project).

Mining Effort Data from the OSS Repository of Developer's Bug Fix Activity

2010

During the evolution of any software, efforts are made to fix bugs or to add new features in software. In software engineering, previous history of effort data is required to build an effort estimation model, which estimates the cost and complexity of any software. Therefore, the role of effort data is indispensable to build state-of-the-art effort estimation models. Most of the Open Source Software does not maintain any effort related information. Consequently there is no state-of-the-art effort estimation model for Open Source Software, whereas most of the existing effort models are for commercial software. In this paper we present an approach to build an effort estimation model for Open Source Software. For this purpose we suggest to mine effort data from the history of the developer's bug fix activities. Our approach determines the actual time spend to fix a bug, and considers it as an estimated effort. Initially, we use the developer's bug-fix-activity data to construct the developer's activity log-book. The log-book is used to store the actual time elapsed to fix a bug. Subsequently, the log-book information is used to mine the bug fix effort data. Furthermore, the developer's bug fix activity data is used to define three different measures for the developer's contribution or expertise level. Finally, we used the bug-fix-activity data to visualize the developer's collaborations and the involved source files. In order to perform an experiment we selected the Mozilla open source project and downloaded 93,607 bug reports from the Mozilla project bug tracking system i.e., Bugzilla. We also downloaded the available CVS-log data from the Mozilla project repository. In this study we reveal that in case of Mozilla only 4.9% developers have been involved in fixing 71.5% of the reported bugs.