Assessing the quality factors found in in-line documentation written in natural language: The JavadocMiner (original) (raw)

Textual Analysis and Software Quality: Challenges and Opportunities

Source code lexicon (identifier names and comments) has been used-as an alternative or as a complement to source code structure-to perform various kinds of analyses (e.g., traceability recovery). All these successful applications increased in the recent years the interest in using textual analysis for improving and assessing the quality of a software system. In particular, textual analysis could be used to identify refactoring opportunities or ambiguous identifiers that may increase the program comprehension burden by creating a mismatch between the developers' cognitive model and the intended meaning of the term, thus ultimately increasing the risk of fault proneness. In addition, when used "on-line" during software development, textual analysis could guide the programmers to select better identifiers aiming at improving the quality of the source code lexicon. In this paper, we overview research in text analysis for the assessment and the improvement of software quality and discuss our achievements to date, the challenges, and the opportunities for the future.

Challenges in Analyzing Software Documentation in Portuguese

2015 29th Brazilian Symposium on Software Engineering, 2015

Many tools that automatically analyze, summarize, or transform software artifacts rely on natural language processing tooling for the interpretation of natural language text produced by software developers, such as documentation, code comments, commit messages, or bug reports. Processing natural language text produced by software developers is challenging because of unique characteristics not found in other texts, such as the presence of code terms and the systematic use of incomplete sentences. In addition, texts produced by Portuguese-speaking developers mix languages since many keywords and programming concepts are referred to by their English name. In this paper, we provide empirical insights into the challenges of analyzing software artifacts written in Portuguese. We analyzed 100 question titles from the Portuguese version of Stack Overflow with two Portuguese language tools and identified multiple problems which resulted in very few sentences being tagged completely correctly. Based on these results, we propose heuristics to improve the analysis of natural language text produced by software developers in Portuguese.

The Evaluation of an Approach for Automatic Generated Documentation

2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), 2017

Two studies are conducted to evaluate an approach to automatically generate natural language documentation summaries for C++ methods. The documentation approach relies on a method's stereotype information. First, each method is automatically assigned a stereotype(s) based on static analysis and a set of heuristics. Then, the approach uses the stereotype information, static analysis, and predefined templates to generate a natural-language summary/documentation for each method. This documentation is automatically added to the code base as a comment for each method. The result of the first study reveals that the generated documentation is accurate, does not include unnecessary information, and does a reasonable job describing what the method does. Based on statistical analysis of the second study, the most important part of the documentation is the short description as it describes the intended behavior of a method.

Software Documentation: a Standard for the 21st Century

This paper describes a desk study in which a user manual dating from 1925, describing how to operate a table loom, was verified for conformance to an ISO standard that lists requirements for software user documentation and was last revised in 2008. A remarkably high degree of conformance was established. It is discussed whether this is coincidental, or more fundamentally related to the way in which the documentation of software tools is regarded as quantitatively rather than qualitatively different from the documentation of physical tools. The paper concludes by making a case for academic research resulting in guidelines for software documentation that take into account the specific nature of software, and the specific problems that users encounter when working with software.

A Structured Model for Software Documentation

1984

The concept of "structured programming" was developed to facilitate software production, but it has not carried over to documentation design. Two co cepts of structure are relevant to user documentation for computer p ograms. The first is based on programming techniques that e phasize decomposition of tasks into discrete modules, while the second was developed in discourse analysis to explain strategies\used by readers and to model their cognitive processes in forming ental models of text content. Consideration of the text produ tion and text comprehension approaches together provides a b sis for designing "user-friendly" software manuals. A model for str ctured documentation suggests the need for: modules to be appy.opriaie macropropositions (global content of the text); clear identi,fication of module function as a tutorial, operational, or reference component; planned ordering of modules and explicit superstructures to help readers identify effective strategies; and adequate access points to modules through such devices as indexes. An examination of the surface structures of 15 manuals for microcomputer file management indicated that structural guidance in existing manuals is inadequate. Nine references and the manuals that were examined are listed. (LMM)

Towards Empirical Evidence on the Comprehensibility of Natural Language Versus Programming Language

Understanding Innovation, 2019

In software design teams, communication between programmers and non-programming domain experts is an ongoing challenge. In this communication, source code documents could be a valuable artifact as they describe domain logic in an unambiguous way. Some programming languages, such as the Smalltalk programming language, try to make source code accessible. Its concise syntax and message-passing semantics are so close to basic English, that it is likely to appeal to even non-programming domain experts. However, the inherent obscurity of technical programming details still poses a significant burden for text comprehension. We conducted a code-reading study in form of a questionnaire through Amazon Mechanical Turk and SurveyMonkey. The results indicate that even in simple problem domains, a simple English text is more comprehensive than a simple Smalltalk program. Consequently, source code in its current text form should not be used as a reliable communication medium between programmers an...

Contextualized Programming Language Documentation

Proceedings of the 2022 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software

Learning the syntax and semantics of a new programming language is a challenge. It is common for learners to refer to language documentation many times and in many contexts as they build comfort and understanding. We review existing functional language documentation, finding that it tends to be organized according to the structure of the language. Each section interleaves narrative explanations, which introduce precise terminology that is then used consistently, with code examples. Sections often start with simpler special cases of a construct before considering it in full generality. To make use of language documentation, learners must step away from the code they are working with, e.g., in an exercise or tutorial, to locate and transfer knowledge from the documentation. We describe a system, ExplainThis, that automatically generates contextualized language documentation, structured based on our study of language documentation but specialized to the particular code at the cursor. This system is integrated into the structure editor of Hazel, a live functional environment. Documentation appears next to the editor and color is used as secondary notation to correlate the explanation with program terms. We also study syntactic and explanatory specificity with a formative user study. We find that participants desire documentation to be tailored to specific syntax of the code a user is working with, while allowing an adaptive level of specificity for code examples. CCS Concepts: • Software and its engineering → Development frameworks and environments.

Programming in natural language: A descriptive analysis

Behavior Research Methods, 1985

This study investigated the utility of natural language in specifying procedures. Performance of programmers was compared with that of nonprogrammers on two types of problems: a "realworld" ordering task and a computerlike database search task. Programmers performed better than nonprogrammers in general, although overall differences between the groups were greater for the real-world problem domain. Analyses of protocols suggest that an unconstrained natural language programming environment is presently infeasible. Although constraints imposed in the problem specification do appear to improve performance, they alone are not sufficient to produce efficient natural language programming. It is argued that programming requires general problemsolving strategies and that at least some aspects of such strategies may be dependent upon the specific language in which they are implemented.

Qualities of Relevant Software Documentation: An Industrial Study

2002

Abstract This paper highlights the results of a survey of software professional conducted in March and April, 2002. The results are compiled from 48 software professionals ranging from junior developers to managers and project leaders. One of the goals of this survey was to uncover the perceived relevance (or lack thereof) of software documentation, and the tools and technologies used to maintain, verify and validate such documents.

Analyze software documentation methods: Issues and Solutions

Journal of Emerging Technologies and Innovative Research, 2024

The interest in software documentation among certain large-scale software projects is very important. The organization's people agreed that little was known about the structure, maintenance and significance of these records. Another finding was that there was no interest in using software documentation for several small to medium-sized software projects. People in these organizations claimed to believe in the value of documentation, but there weren't many resources available to them due to scheduling, budgeting and other issues. Also, they face problems and have no perfect idea about documentation tool selection. In this study, we try to find out the specific issues they face with software documentation. We try to identify ways to improve documentation's simplicity or ideally to make it more relevant. Then, we try to give some solutions for the specific issues that we find and give an idea about tool selection.