Automatic Metric Thresholds Derivation for Code Smell Detection (original) (raw)

Building empirical support for automated code smell detection

Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement - ESEM '10, 2010

Identifying refactoring opportunities in software systems is an important activity in today's agile development environments. The concept of code smells has been proposed to characterize different types of design shortcomings in code. Additionally, metric-based detection algorithms claim to identify the "smelly" components automatically.

ConcernMeBS: Metrics-based Detection of Code Smells

2013

Software metrics have been traditionally used to evaluate the maintainability of the software systems and to detect code smells. Code smells are symptoms that may indicate something wrong in the system code. Recently, concern-sensitive metrics and metrics-based heuristics have been proposed to detect code smells. However, the application of this kind of metrics and heuristics are time consuming without proper tool support. To address this task, this paper presents a tool, called ConcernMeBS, to help developers to detected code smells. Based on concern to code mapping, ConcernMeBS automatically finds and reports classes and methods that are prone to suffer from code smells in OO source code.

Automatic detection of bad smells in code: An experimental assessment

The Journal of Object Technology, 2012

Code smells are structural characteristics of software that may indicate a code or design problem that makes software hard to evolve and maintain, and may trigger refactoring of code. Recent research is active in defining automatic detection tools to help humans in finding smells when code size becomes unmanageable for manual review. Since the definitions of code smells are informal and subjective, assessing how effective code smell detection tools are is both important and hard to achieve. This paper reviews the current panorama of the tools for automatic code smell detection. It defines research questions about the consistency of their responses, their ability to expose the regions of code most affected by structural decay, and the relevance of their responses with respect to future software evolution. It gives answers to them by analyzing the output of four representative code smell detectors applied to six different versions of GanttProject, an open source system written in Java. The results of these experiments cast light on what current code smell detection tools are able to do and what the relevant areas for further improvement are.

QScored: A Large Dataset of Code Smells and Quality Metrics

2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), 2021

Code quality aspects such as code smells and code quality metrics are widely used in exploratory and empirical software engineering research. In such studies, researchers spend a substantial amount of time and effort to not only select the appropriate subject systems but also to analyze them to collect the required code quality information. In this paper, we present QScored dataset; the dataset contains code quality information of more than 86 thousand C# and Java GitHub repositories containing more than 1.1 billion lines of code. The code quality information contains seven kinds of detected architecture smells, 20 kinds of design smells, eleven kinds of implementation smells, and 27 commonly used code quality metrics computed at project, package, class, and method levels. Availability of the dataset will facilitate empirical studies involving code quality aspects by making the information readily available for a large number of active GitHub repositories.

A Novel Approach for Code Smell Detection: An Empirical Study

IEEE Access

Code smells detection helps in improving understandability and maintainability of software while reducing the chances of system failure. In this study, six machine learning algorithms have been applied to predict code smells. For this purpose, four code smell datasets (God-class, Data-class, Featureenvy, and Long-method) are considered which are generated from 74 open-source systems. To evaluate the performance of machine learning algorithms on these code smell datasets, 10-fold cross validation technique is applied that predicts the model by partitioning the original dataset into a training set to train the model and test set to evaluate it. Two feature selection techniques are applied to enhance our prediction accuracy. The Chi-squared and Wrapper-based feature selection techniques are used to improve the accuracy of total six machine learning methods by choosing the top metrics in each dataset. Results obtained by applying these two feature selection techniques are compared. To improve the accuracy of these algorithms, grid searchbased parameter optimization technique is applied. In this study, 100% accuracy was obtained for the Longmethod dataset by using the Logistic Regression algorithm with all features while the worst performance 95.20% was obtained by Naive Bayes algorithm for the Long-method dataset using the chi-square feature selection technique.

On the Effectiveness of Concern Metrics to Detect Code Smells: An Empirical Study

Lecture Notes in Computer Science, 2014

Traditional software metrics have been used to evaluate the maintainability of software programs by supporting the identification of code smells. Recently, concern metrics have also been proposed with this purpose. While traditional metrics quantify properties of software modules, concern metrics quantify concern properties, such as scattering and tangling. Despite being increasingly used in empirical studies, there is a lack of empirical knowledge about the effectiveness of concern metrics to detect code smells. This paper reports the results of an empirical study to investigate whether concern metrics can be useful indicators of three code smells, namely Divergent Change, Shotgun Surgery, and God Class. In this study, 54 subjects from two different institutions have analyzed traditional and concern metrics aiming to detect instances of these code smells in two information systems. The study results indicate that, in general, concern metrics support developers detecting code smells. In particular, we observed that (i) the time spent in code smell detection is more relevant than the developers' expertise; (ii) concern metrics are clearly useful to detect Divergent Change and God Class; and (iii) the concern metric Number of Concerns per Component is a reliable indicator of Divergent Change.

A Lightweight Approach for Detection of Code Smells

Arabian Journal for Science and Engineering, 2016

The accurate removal of code smells from source code supports activities such as refactoring, maintenance, examining code quality etc. A large number of techniques and tools are presented for the specification and detection of code smells from source code in the last decade, but they still lack accuracy and flexibility due to different interpretations of code smell definitions. Most techniques target just detection of few code smells and render different results on the same examined systems due to different informal definitions and threshold values of metrics used for detecting code smells. We present a flexible and lightweight approach based on multiple searching techniques for the detection and visualization of all 22 code smells from source code of multiple languages. Our approach is lightweight and flexible due to application of SQL queries on intermediate repository and use of regular expressions on selected source code constructs. The concept of approach is validated by performing experiments on eight publicly available open source software projects developed using Java and C# programming languages, and results are compared with existing approaches. The accuracy of presented approach varies from 86-97 % on the eight selected software projects.

Automatic Human-Like Detection of Code Smells

Discovery Science, 2021

Many code smell detection techniques and tools have been proposed, mainly aiming to eliminate design flaws and improve software quality. Most of them are based on heuristics which rely on a set of software metrics and corresponding threshold values. Those techniques and tools suffer from subjectivity issues, discordant results among the tools, and the reliability of the thresholds. To mitigate these problems, we used machine learning to automate developers' perception in code smells detection. Different from other existing machine learning used in code smell detection we trained our models with an extensive dataset based on more than 3000 professional reviews on 518 open source projects. We conclude by an empirical evaluation of the performance of the machine learning approach against PMD, a widely used metric-based code smell detection tool for Java. The experimental results show that the machine learning approach outperforms the PMD classifier in all evaluations.

A Bayesian Approach for the Detection of Code and Design Smells

The presence of code and design smells can have a severe impact on the quality of a program. Consequently, their detection and correction have drawn the attention of both researchers and practitioners who have proposed various approaches to detect code and design smells in programs. However, none of these approaches handle the inherent uncertainty of the detection process. We propose a Bayesian approach to manage this uncertainty. First, we present a systematic process to convert existing state-of-the-art detection rules into a probabilistic model. We illustrate this process by generating a model to detect occurrences of the Blob antipattern. Second, we present results of the validation of the model: we built this model on two open-source programs, GanttProject v1.10.2 and Xerces v2.7.0, and measured its accuracy. Third, we compare our model with another approach to show that it returns the same candidate classes while ordering them to minimise the quality analysts’ effort. Finally, we show that when past detection results are available, our model can be calibrated using machine learning techniques to offer an improved, context-specific detection.