An approach to classify software maintenance requests (original) (raw)
Related papers
Categorizing software applications for maintenance
2011 27th IEEE International Conference on Software Maintenance (ICSM), 2011
Software repositories hold applications that are often categorized to improve the effectiveness of various maintenance tasks. Properly categorized applications allow stakeholders to identify requirements related to their applications and predict maintenance problems in software projects. Unfortunately, for different legal and organizational reasons the source code is often not available, thus making it difficult to automatically categorize binary executables of software applications.
A case study of a maintenance support system
Proceedings of International Conference on Software Maintenance, 1995
One of the problems when maintaining a system installed at many sites and with many support centres, is that a lot of problems are reported several times, thus creating a large amount of extra work for the system maintenance organization. In order to reduce this problem, we have built a trouble report filter that will filter out more than 30% of all repeated trouble reports, even if the problem is described by different terms or at different system levels. The work consists of building a classification model based on the system description, terms used in the trouble reports, relations between these terms and by using measures for term distance and importance to compute a trouble report distance. The model can be tuned to maximum efficiency by varying the distance and importance measures. After describing the model and the term network, we describe two experiments with real data. The results show that it is possible to build a simple but highly efficient model for recognizing trouble reports.
Panning requirement nuggets in stream of software maintenance tickets
Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2014
There is an increasing trend to outsource maintenance of large applications and application portfolios of a business to third parties, specialising in application maintenance, who are incented to deliver the best possible maintenance at the lowest cost. To do so, they need to identify repeat problem areas, which cause more maintenance grief, and seek a unified remedy to avoid the costs spent on fixing these individually. These repeat areas, in a sense, represent major, evolving areas of need, or requirements, for the customer. The information about the repeating problem is typically embedded in the unstructured text of multiple tickets, waiting to be found and addressed. Currently, repeat problems are found by manual analysis; effective solutions depend on the collective experience of the team solving them. In this paper, we propose an approach to automatically analyze problem tickets to discover groups of problems being reported in them and provide meaningful, descriptive labels to help interpret these groups. Our approach incorporates a cleansing phase to handle the high level of noise observed in problem tickets and a method to incorporate multiple text clustering techniques and merge their results in a meaningful manner. We provide detailed experiments to quantitatively and qualitatively evaluate our approach.
Software metric classification trees help guide the maintenance of large-scale systems
1989
The "80:20 rule" states that approximately 20 percent of a software system is responsible for 80 percent of its errors. This study proposes an automated method for generating empirically-based models of error-prone software objectsthese models are intended to help localize the "troublesome 20 percent." The proposed method uses a recursive algorithm to automatically generate classification trees, whose nodes are multi-valued functions based on software metrics. The purpose of the classification trees is to identify components that are likely to be error-prone or costly, so that developers can focus their resources accordingly. We conducted a feasibility study using 16 NASA projects (3000-112,000 lines). On the average, the classification trees correctly identified 79.3% of the software modules that had high development effort or faults. In a second study, we are applying the proposed approach in a Hughes maintenance environment to identify fault-prone and change-prone components in a largescale system (>100,000 lines). The use of project data from NASA and Hughes is intended to demonstrate the applicability of the method to large-scale projects. The classification tree generation tools are environment independent and are calibrated to particular environments by measurements of past releases and projects.
On the Benefits of Planning and Grouping Software Maintenance Requests
2011 15th European Conference on Software Maintenance and Reengineering, 2011
Despite its unquestionable importance, software maintenance usually has a negative image among software developers and even project managers. As a result, it is common to consider maintenance requests as short-term tasks that should be implemented as quick as possible to have a minimal impact for end-users. In order to promote software maintenance to a first-class software development activity, we first define in this paper a lightweighted process -called PASM (Process for Arranging Software Maintenance Requests) -for handling maintenance as software projects. Next, we describe an in-depth evaluation of the benefits achieved by the PASM process at a real software development organization. For this purpose, we rely on a set of clustering analysis techniques in order to better understand and compare the requests handled before and after the adoption of the proposed process. Our results indicate that the number of projects created to handle maintenance requests has increased almost three times after this organization has adopted the PASM process. Furthermore, we also concluded that projects based on the PASM present a better balance between the various software engineering activities. For example, after adopting PASM the developers have dedicated more time to analysis and validation and less time to implementation and codification tasks.
Innovations in Systems and Software Engineering, 2017
Software developers, testers and customers routinely submit issue reports to software issue trackers to record the problems they face in using a software. The issues are then directed to appropriate experts for analysis and fixing. However, submitters often misclassify an improvement request as a bug and vice versa. This costs valuable developer time. Hence automated classification of the submitted reports would be of great practical utility. In this paper, we analyze how machine learning techniques may be used to perform this task. We apply different classification algorithms, namely naive Bayes, linear discriminant analysis, k-nearest neighbors, support vector machine (SVM) with various kernels, decision tree and random forest separately to classify the reports from three open-source projects. We evaluate their performance in terms of F-measure, average accuracy and weighted average F-measure. Our experiments show that random forests perform best, while SVM with certain kernels also achieve high performance.
Machine Learning Based Bug Triaging for Severity Identification
2020
The bug triage system is responsible for executing the classification of bugs by making use of unique developers. Different parameters like severity, priority and risk factor are utilized. The list of defects is taken and then assigned to individual developers based on priority of the bugs assigned. The defects are then assigned based on order of story to be completed and then assigned to different scrum teams if agile method is followed or to other miscellaneous teams if iterative development model is followed. This process usually takes time and will cost more to the company. The second problem is that there is either no learning or manual learning. It takes into consideration the developer’s likeness or the kind of bugs previously fixed by the developers. The defect triaging challenges can be significantly reduced by making use of machine learning algorithms which can take historical bugs solved or assigned as the training set for the algorithm, thus intuitively forecast the severity of a defect. Then, class label assignment is done using a machine learning algorithm, which is a combination of both supervised and unsupervised learning methods. This will reduce the process of manual efforts and can achieve partial automation of bug assignments. This project aims to use an efficient Machine learning technique SVM with K -Means Clustering to perform Bug Triaging, which will provide the ability to execute the idea of enhancing the current system with the use of Machine Learning, thus minimizing and simplifying human work, thereby increasing the efficiency and productivity of the team. For the classification of test data, the SVM algorithm is executed. The SVM model is assessed by executing the conversion of defects into tokens, obtaining the count of the tokens, based on which the inverse document word count is found for the bug words and the grouping is done using k means. This method is then compared with conventional Naive Bayes for more accuracy and lesser time consumption. The model evaluation has been done based on the time taken for categorization and the extent of accuracy attained. The datasets of a premier airline, flydubai were used for this evaluation. The comparison is also done with respect to different group sizes of the same data set, after studying a graph model of data size versus accuracy as well as data size verses logarithmic loss. From the implementation results, it is proved that KMEANS-SVM has better accuracy, loss measure, time and memory as compared to Naive Bayes and SVM method. KMEANS-SVM has the high accuracy of about 99.9912 followed by SVM which is around 98.1116 and Naive Bayes is 96.4420. Also, KMEANS-SVM has the low loss measure of about 0.0087 followed by SVM which is around 1.888 and Naive Bayes loss measure is around 3.557.
IT Ticket Classification: The Simpler, the Better
IEEE Access, 2020
Recently, automatic classification of IT tickets has gained notable attention due to the increasing complexity of IT services deployed in enterprises. There are multiple discussions and no general opinion in the research and practitioners' community on the design of IT ticket classification tasks, specifically the choice of ticket text representation techniques and classification algorithms. Our study aims to investigate the core design elements of a typical IT ticket text classification pipeline. In particular, we compare the performance of TF-IDF and linguistic features-based text representations designed for ticket complexity prediction. We apply various classifiers, including kNN, its enhanced versions, decision trees, naïve Bayes, logistic regression, support vector machines, as well as semi-supervised techniques to predict the ticket class label of low, medium, or high complexity. Finally, we discuss the evaluation results and their practical implications. As our study shows, linguistic representation not only proves to be highly explainable but also demonstrates a substantial prediction quality increase over TF-IDF. Furthermore, our experiments evidence the importance of feature selection. We indicate that even simple algorithms can deliver highquality prediction when using appropriate linguistic features.
Towards a Classification of Bugs to Facilitate Software Maintainability Tasks
Software maintainability is an important software quality attribute that defines the degree by which a software system is understood, repaired, or enhanced. In recent years, there has been an increase in attention in techniques and tools that mine large bug repositories to help software developers understand the causes of bugs and speed up the fixing process. These techniques, however, treat all bugs in the same way. Bugs that are fixed by changing a single location in the code are examined the same way as those that require complex changes. After examining more than 100 thousand bug reports of 380 projects, we found that bugs can be classified into four types based on the location of their fixes. Type 1 bugs are the ones that fixed by modifying a single location in the code, while Type 2 refers to bugs that are fixed in more than one location. Type 3 refers to multiple bugs that are fixed in the exact same location. Type 4 is an extension of Type 3, where multiple bugs are resolved by modifying the same set of locations. This classification can help companies put the resources where they are needed the most. It also provides useful insight into the quality of the code. Knowing, for example, that a system contains a large number of bugs of Type 4 suggests high maintenance efforts. This classification can also be used for other tasks such as predicting the type of incoming bugs for an improved bug handling process. For example, if a bug is found to be of Type 4 then it should be directed to experienced developers.