Establishing a Traceability Links Between The Source Code And Requirement Analysis, A Survey on Traceability (original) (raw)

Recovery of traceability links between software documentation and source code

2005

An approach for the semi-automated recovery of traceability links between software documentation and source code is presented. The methodology is based on the application of information retrieval techniques to extract and analyze the semantic information from the source code and associated documentation. A semi-automatic process is defined based on the proposed methodology. The paper advocates the use of latent semantic indexing (LSI) as the supporting information retrieval technique.

Recovering traceability links between code and documentation

IEEE Transactions on Software Engineering, 2002

Software system documentation is almost always expressed informally in natural language and free text. Examples include requirement specifications, design documents, manual pages, system development journals, error logs, and related maintenance reports. We propose a method based on information retrieval to recover traceability links between source code and free text documents. A premise of our work is that programmers use meaningful names for program items, such as functions, variables, types, classes, and methods. We believe that the application-domain knowledge that programmers process when writing the code is often captured by the mnemonics for identifiers; therefore, the analysis of these mnemonics can help to associate high-level concepts with program concepts and vice-versa. We apply both a probabilistic and a vector space information retrieval model in two case studies to trace C++ source code onto manual pages and Java code to functional requirements. We compare the results of applying the two models, discuss the benefits and limitations, and describe directions for improvements.

Improving the identification of traceability links between source code and requirements

Software developers are interested in requirement tracea-bility to e.g., verify if all requirements are covered by a system design specification. Based on the assumption that related artifacts contain related terms, researchers have developed, used, and extended algorithms that identify related terms and subsequently infer which arti-facts are related (i.e., there is a traceability link between them). Source code is not as verbose as a natural language description, which reduces the applicability of algorithms that precisely rely on such a commonality. This paper extends the Vector Space Model using tf*idf term weights to improve the identification of traceability links between source code and requirements. To this extent, we modify the way how requirements are identified and to include user feedback. We show that the inclusion of user feedback significantly improved the number of correctly identified requirements.

Information retrieval models for recovering traceability links between code and documentation

2000

Abstract The research described in the paper is concerned with the application of information retrieval to software maintenance, and in particular to the problem of recovering traceability links between the source code of a system and its free text documentation. We introduce a method based on the general idea of vector space information retrieval and apply it in two case studies to trace C++ source code onto manual pages and Java code onto functional requirements.

The role of traceability in requirements engineering

Software documentation is usually expressed in natural languages contains much useful information. Therefore establishing the traceability links between documentation and source code can be very helpful for software engineering management. Suchas requirement traceability, impact analysis, and software reuse, currently. Therecovery of traceability links is mostly based on information retrieval techniques, for instance, probabilistic model, vector space model, and latent semantic indexing. Previous work treats both documentation and source code as plain text files. The quality of retrieved links can be improved by imposing additional structure using that they are software engineering documents. In this paper, we present four enhanced strategies to improve traditional LSI method based on the special characteristics of documentation experimental results show that the first three enhanced strategies can increase the precision of retrieved links by 5%∼16%, while the fourth strategy is about 13%.

Normalizing source code vocabulary

Proceedings - Working Conference on Reverse Engineering, WCRE, 2010

The potential benefits of traceability are well known, as well as the impracticability of recovering and maintaining traceability links manually. Indeed, the manual management of traceability information is an error prone and time consuming task. Consequently, despite the advantages that can be gained, explicit traceability is rarely established unless there is a regulatory reason for doing so. Extensive efforts have been brought forth to improve the explicit connection of software artifacts in the software engineering community (both research and commercial). Promising results have been achieved using Information Retrieval (IR) techniques for traceability recovery. IR-based traceability recovery methods propose a list of candidate traceability links based on the similarity between the text contained in the software artifacts. Software artifacts have different structures and the common element among many of them is the textual data, which most often captures the informal semantics of artifacts. For example, source code includes large volume of textual data in the form of comments and identifiers. In consequence, IRbased approaches are very well suited to address the traceability recovery problem. The conjecture is that artifacts with high textual similarity are good candidate to be traced to each other since they share several concepts. In this chapter we overview a general process of using IR-based methods for traceability link recovery and overview some of them in a greater detail: probabilistic, vector space and Latent Semantic Indexing models. Finally, we discuss common approaches to measuring the performances of IR-based traceability methods and the latest advances in techniques for analysis of candidate links.

Recovering documentation-to-source-code traceability links using latent semantic indexing

2003

Abstract An information retrieval technique, latent semantic indexing, is used to automatically identify traceability links from system documentation to program source code. The results of two experiments to identify links in existing software systems (ie, the LEDA library, and Albergate) are presented. These results are compared with other similar type experimental results of traceability link identification using different types of information retrieval techniques.

Using traceability links to assess and maintain the quality of software documentation

2007

ABSTRACT The paper proposes an approach for using traceability links to assess and maintain the quality of software documentation. Our position is that quality documentation should accurately reflect the structure of the source code; hence elements of documentation that link to strongly coupled elements of the source code should also be strongly related. We use latent semantic indexing (LSI) to compute similarities among sections of external documentation.

Recovering traceability links in software artifact management systems using information retrieval methods

ACM Transactions on Software Engineering and Methodology, 2007

The main drawback of existing software artifact management systems is the lack of automatic or semi-automatic traceability link generation and maintenance. We have improved an artifact management system with a traceability recovery tool based on Latent Semantic Indexing (LSI), an information retrieval technique. We have assessed LSI to identify strengths and limitations of using information retrieval techniques for traceability recovery and devised the need for an incremental approach. The method and the tool have been evaluated during the development of seventeen software projects involving about 150 students. We observed that although tools based on information retrieval provide a useful support for the identification of traceability links during software development, they are still far to support a complete semi-automatic recovery of all links. The results of our experience have also shown that such tools can help to identify quality problems in the textual description of traced artifacts.

Combining textual and structural analysis of software artifacts for traceability link recovery

2009 ICSE Workshop on Traceability in Emerging Forms of Software Engineering, 2009

Existing methods for recovering traceability links among software documentation artifacts analyze textual similarities among these artifacts. It may be the case, however, that related documentation elements share little terminology or phrasing. This paper presents a technique for indirectly recovering these traceability links in requirements documentation by combining textual with structural information as we conjecture that related requirements share related source code elements. A preliminary case study indicates that our combined approach improves the precision and recall of recovering relevant links among documents as compared to stand-alone methods based solely on analyzing textual similarities.