Hà Nguyễn - Academia.edu (original) (raw)
Uploads
Papers by Hà Nguyễn
Code clone management has been shown to have several benefits for software developers. When sourc... more Code clone management has been shown to have several benefits for software developers. When source code evolves, clone management requires a mechanism to efficiently and incrementally detect code clones in the new revision. This paper introduces an incremental clone detection tool, called ClemanX. Our tool represents code fragments as subtrees of Abstract Syntax Trees (ASTs), measures their similarity levels based on their characteristic vectors of structural features, and solves the task of incrementally detecting similar code as an incremental distance-based clustering problem. Our empirical evaluation on large-scale software projects shows the usefulness and good performance of ClemanX.
Existing version control systems are often based on text line-oriented models for change represen... more Existing version control systems are often based on text line-oriented models for change representation, which do not facilitate software developers in understanding code evolution. Other advanced change representation models that encompass more program semantics and structures are still not quite practical due to their high computational complexity. This paper presents OperV, a novel operation-based version control model that is able to support both coarse and fine levels of granularity in program source code. In OperV, a software system is represented by a project tree whose nodes represent all program entities, such as packages, classes, methods, etc. The changes of the system are represented via edit operations on the tree. OperV also provides the algorithms to differ, store, and retrieve the versions of such entities. These algorithms are based on the mapping of the nodes between versions of the project tree. This mapping technique uses 1) divide-and-conquer technique to map coarse- and fine-grained entities separately, 2) unchanged text regions to map unchanged leaf nodes, and 3) structure-based similarity of the sub-trees to map their root nodes bottom-up and then top-down. The empirical evaluation of OperV has shown that it is scalable, efficient, and could be useful in understanding program evolution.
However, there has been little work on clone detection in models with the limitations on detectio... more However, there has been little work on clone detection in models with the limitations on detection precision and completeness. This paper presents ModelCD, a novel clone detection tool for Matlab/Simulink models, that is able to efficiently and accurately detect both exactly matched and approximate model clones. The core of ModelCD is two novel graph-based clone detection algorithms that are able to systematically and incrementally discover clones with a high degree of completeness, accuracy, and scalability. We have conducted an empirical evaluation with various experimental studies on many real-world systems to demonstrate the usefulness of our approach and to compare the performance of ModelCD with existing tools.
Structure-oriented approaches in clone detection have become popular in both code-based and model... more Structure-oriented approaches in clone detection have become popular in both code-based and model-based clone detection. However, existing methods for capturing structural information in software artifacts are either too computationally expensive to be efficient or too light-weight to be accurate in clone detection. In this paper, we present Exas, an accurate and efficient structural characteristic feature extraction approach that better approximates and captures the structure within the fragments of artifacts. Exas structural features are the sequences of labels and numbers built from nodes, edges, and paths of various lengths of a graph-based representation. A fragment is characterized by a structural characteristic vector of the occurrence counts of those features. We have applied Exas in building two clone detection tools for source code and models. Our analytic study and empirical evaluation on open-source software show that Exas and its algorithm for computing the characteristic vectors are highly accurate and efficient in clone detection.
Code clone management has been shown to have several benefits for software developers. When sourc... more Code clone management has been shown to have several benefits for software developers. When source code evolves, clone management requires a mechanism to efficiently and incrementally detect code clones in the new revision. This paper introduces an incremental clone detection tool, called ClemanX. Our tool represents code fragments as subtrees of Abstract Syntax Trees (ASTs), measures their similarity levels based on their characteristic vectors of structural features, and solves the task of incrementally detecting similar code as an incremental distance-based clustering problem. Our empirical evaluation on large-scale software projects shows the usefulness and good performance of ClemanX.
Existing version control systems are often based on text line-oriented models for change represen... more Existing version control systems are often based on text line-oriented models for change representation, which do not facilitate software developers in understanding code evolution. Other advanced change representation models that encompass more program semantics and structures are still not quite practical due to their high computational complexity. This paper presents OperV, a novel operation-based version control model that is able to support both coarse and fine levels of granularity in program source code. In OperV, a software system is represented by a project tree whose nodes represent all program entities, such as packages, classes, methods, etc. The changes of the system are represented via edit operations on the tree. OperV also provides the algorithms to differ, store, and retrieve the versions of such entities. These algorithms are based on the mapping of the nodes between versions of the project tree. This mapping technique uses 1) divide-and-conquer technique to map coarse- and fine-grained entities separately, 2) unchanged text regions to map unchanged leaf nodes, and 3) structure-based similarity of the sub-trees to map their root nodes bottom-up and then top-down. The empirical evaluation of OperV has shown that it is scalable, efficient, and could be useful in understanding program evolution.
However, there has been little work on clone detection in models with the limitations on detectio... more However, there has been little work on clone detection in models with the limitations on detection precision and completeness. This paper presents ModelCD, a novel clone detection tool for Matlab/Simulink models, that is able to efficiently and accurately detect both exactly matched and approximate model clones. The core of ModelCD is two novel graph-based clone detection algorithms that are able to systematically and incrementally discover clones with a high degree of completeness, accuracy, and scalability. We have conducted an empirical evaluation with various experimental studies on many real-world systems to demonstrate the usefulness of our approach and to compare the performance of ModelCD with existing tools.
Structure-oriented approaches in clone detection have become popular in both code-based and model... more Structure-oriented approaches in clone detection have become popular in both code-based and model-based clone detection. However, existing methods for capturing structural information in software artifacts are either too computationally expensive to be efficient or too light-weight to be accurate in clone detection. In this paper, we present Exas, an accurate and efficient structural characteristic feature extraction approach that better approximates and captures the structure within the fragments of artifacts. Exas structural features are the sequences of labels and numbers built from nodes, edges, and paths of various lengths of a graph-based representation. A fragment is characterized by a structural characteristic vector of the occurrence counts of those features. We have applied Exas in building two clone detection tools for source code and models. Our analytic study and empirical evaluation on open-source software show that Exas and its algorithm for computing the characteristic vectors are highly accurate and efficient in clone detection.