Using traceability links to assess and maintain the quality of software documentation (original) (raw)
Related papers
Recovery of traceability links between software documentation and source code
2005
An approach for the semi-automated recovery of traceability links between software documentation and source code is presented. The methodology is based on the application of information retrieval techniques to extract and analyze the semantic information from the source code and associated documentation. A semi-automatic process is defined based on the proposed methodology. The paper advocates the use of latent semantic indexing (LSI) as the supporting information retrieval technique.
Recovering documentation-to-source-code traceability links using latent semantic indexing
2003
Abstract An information retrieval technique, latent semantic indexing, is used to automatically identify traceability links from system documentation to program source code. The results of two experiments to identify links in existing software systems (ie, the LEDA library, and Albergate) are presented. These results are compared with other similar type experimental results of traceability link identification using different types of information retrieval techniques.
Information retrieval models for recovering traceability links between code and documentation
2000
Abstract The research described in the paper is concerned with the application of information retrieval to software maintenance, and in particular to the problem of recovering traceability links between the source code of a system and its free text documentation. We introduce a method based on the general idea of vector space information retrieval and apply it in two case studies to trace C++ source code onto manual pages and Java code onto functional requirements.
Combining textual and structural analysis of software artifacts for traceability link recovery
2009 ICSE Workshop on Traceability in Emerging Forms of Software Engineering, 2009
Existing methods for recovering traceability links among software documentation artifacts analyze textual similarities among these artifacts. It may be the case, however, that related documentation elements share little terminology or phrasing. This paper presents a technique for indirectly recovering these traceability links in requirements documentation by combining textual with structural information as we conjecture that related requirements share related source code elements. A preliminary case study indicates that our combined approach improves the precision and recall of recovering relevant links among documents as compared to stand-alone methods based solely on analyzing textual similarities.
Recovering traceability links between code and documentation
IEEE Transactions on Software Engineering, 2002
Software system documentation is almost always expressed informally in natural language and free text. Examples include requirement specifications, design documents, manual pages, system development journals, error logs, and related maintenance reports. We propose a method based on information retrieval to recover traceability links between source code and free text documents. A premise of our work is that programmers use meaningful names for program items, such as functions, variables, types, classes, and methods. We believe that the application-domain knowledge that programmers process when writing the code is often captured by the mnemonics for identifiers; therefore, the analysis of these mnemonics can help to associate high-level concepts with program concepts and vice-versa. We apply both a probabilistic and a vector space information retrieval model in two case studies to trace C++ source code onto manual pages and Java code to functional requirements. We compare the results of applying the two models, discuss the benefits and limitations, and describe directions for improvements.
2014
In the system and source code development, we develop a source code and documentation is mainly in natural language. The Continuous and frequent development require proper requirements change management Traceability is essential for management of change and analysis of its impact. This research paper presents a technique in the domain of traceability we believe that the application-domain knowledge that programmer's process when writing the code is often captured by the mnemonics for identifiers, the analysis of these mnemonics can help to associate high-level concepts with program concepts and vice-versa. . We propose a method based on information retrieval for traceability links between source code and free text documents. Here we use information retrieval techniques for establishing a links between source code and requirement, documentation and latent semantic indexing, is used to automatically identify traceability links from system code. Traceability is the most important f...
Normalizing source code vocabulary
Proceedings - Working Conference on Reverse Engineering, WCRE, 2010
The potential benefits of traceability are well known, as well as the impracticability of recovering and maintaining traceability links manually. Indeed, the manual management of traceability information is an error prone and time consuming task. Consequently, despite the advantages that can be gained, explicit traceability is rarely established unless there is a regulatory reason for doing so. Extensive efforts have been brought forth to improve the explicit connection of software artifacts in the software engineering community (both research and commercial). Promising results have been achieved using Information Retrieval (IR) techniques for traceability recovery. IR-based traceability recovery methods propose a list of candidate traceability links based on the similarity between the text contained in the software artifacts. Software artifacts have different structures and the common element among many of them is the textual data, which most often captures the informal semantics of artifacts. For example, source code includes large volume of textual data in the form of comments and identifiers. In consequence, IRbased approaches are very well suited to address the traceability recovery problem. The conjecture is that artifacts with high textual similarity are good candidate to be traced to each other since they share several concepts. In this chapter we overview a general process of using IR-based methods for traceability link recovery and overview some of them in a greater detail: probabilistic, vector space and Latent Semantic Indexing models. Finally, we discuss common approaches to measuring the performances of IR-based traceability methods and the latest advances in techniques for analysis of candidate links.
ACM Transactions on Software Engineering and Methodology, 2007
The main drawback of existing software artifact management systems is the lack of automatic or semi-automatic traceability link generation and maintenance. We have improved an artifact management system with a traceability recovery tool based on Latent Semantic Indexing (LSI), an information retrieval technique. We have assessed LSI to identify strengths and limitations of using information retrieval techniques for traceability recovery and devised the need for an incremental approach. The method and the tool have been evaluated during the development of seventeen software projects involving about 150 students. We observed that although tools based on information retrieval provide a useful support for the identification of traceability links during software development, they are still far to support a complete semi-automatic recovery of all links. The results of our experience have also shown that such tools can help to identify quality problems in the textual description of traced artifacts.
A Topic Modeling Based Solution for Confirming Software Documentation Quality
International Journal of Advanced Computer Science and Applications, 2016
this paper presents an approach for evaluating and confirming the quality of the external software documentation using topic modeling. Typically, the quality of the external documentation has to mirror precisely the organization of the source code. Therefore, the elements of such documentation should be strongly written, associated, and presented. In this paper, we use Latent Dirichlet Allocation (LDA) and HELLINGER DISTANCE to compute the similarities between the fragments of source code and the external documentation topics. These similarities are used in this paper to improve and advance the existing external documentation. Furthermore, these similarities can also be used for evaluating the new documenting process during the evolution phase of the software. The results show that the new approach yields state-of-the-art performance in evaluating and confirming the existing external documentations quality and superiority.
Mining software repositories for traceability links
15th IEEE International Conference on Program Comprehension (ICPC '07), 2007
An approach to recover/discover traceability links between software artifacts via the examination of a software system's version history is presented. A heuristic-based approach that uses sequential-pattern mining is applied to the commits in software repositories for uncovering highly frequent co-changing sets of artifacts (e.g., source code and documentation). If different types of files are committed together with high frequency then there is a high probability that they have a traceability link between them. The approach is evaluated on a number of versions of the open source system KDE. As a validation step, the discovered links are used to predict similar changes in the newer versions of the same system. The results show highly precision predictions of certain types of traceability links.