Bug Localization Using Revision Log Analysis and Open Bug Repository Text Categorization (original) (raw)

In this paper, we present a new approach to localize a bug in the software source file hierarchy. The proposed approach uses log files of the revision control system and bug reports information in open bug repository of open source projects to train a Support Vector Machine (SVM) classifier. Our approach employs textual information in summary and description of bugs reported to the bug repository, in order to form machine learning features. The class labels are revision paths of fixed issues, as recorded in the log file of the revision control system. Given an unseen bug instance, the trained classifier can predict which part of the software source file hierarchy (revision path) is more likely to be related to this issue. Experimental results on more than 2000 bug reports of 'UI'component of the Eclipse JDT project from the initiation date of the project until November 24, 2009 (about 8 years) using this approach, show weighted precision and recall values of about 98% on average.