Towards Empirical Evidence on the Comprehensibility of Natural Language Versus Programming Language (original) (raw)

Studying the difference between natural and programming language corpora

Empirical Software Engineering, 2019

Code corpora, as observed in large software systems, are now known to be far more repetitive and predictable than natural language corpora. But why? Does the difference simply arise from the syntactic limitations of programming languages? Or does it arise from the differences in authoring decisions made by the writers of these natural and programming language texts? We conjecture that the differences are not entirely due to syntax, but also from the fact that reading and writing code is unnatural for humans, and requires substantial mental effort; so, people prefer to write code in ways that are familiar to both reader and writer. To support this argument, we present results from two sets of studies: 1) a first set aimed at attenuating the effects of syntax, and 2) a second, aimed at measuring repetitiveness of text written in other settings (e.g. second language, technical/specialized jargon), which are also effortful to write. We find find that this repetition in source code is not entirely the result of grammar constraints, and thus some repetition must result from human choice. While the evidence we find of similar repetitive behavior in technical and learner corpora does not conclusively show that such language is used by humans to mitigate difficulty, it is consistent with that theory.

Programming in natural language: A descriptive analysis

Behavior Research Methods, 1985

This study investigated the utility of natural language in specifying procedures. Performance of programmers was compared with that of nonprogrammers on two types of problems: a "realworld" ordering task and a computerlike database search task. Programmers performed better than nonprogrammers in general, although overall differences between the groups were greater for the real-world problem domain. Analyses of protocols suggest that an unconstrained natural language programming environment is presently infeasible. Although constraints imposed in the problem specification do appear to improve performance, they alone are not sufficient to produce efficient natural language programming. It is argued that programming requires general problemsolving strategies and that at least some aspects of such strategies may be dependent upon the specific language in which they are implemented.

Evaluating Code Readability and Legibility: An Examination of Human-centric Studies

2020 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Reading code is an essential activity in software maintenance and evolution. Several studies with human subjects have investigated how different factors, such as the employed programming constructs and naming conventions, can impact code readability, i.e., what makes a program easier or harder to read and apprehend by developers, and code legibility, i.e., what influences the ease of identifying elements of a program. These studies evaluate readability and legibility by means of different comprehension tasks and response variables. In this paper, we examine these tasks and variables in studies that compare programming constructs, coding idioms, naming conventions, and formatting guidelines, e.g., recursive vs. iterative code. To that end, we have conducted a systematic literature review where we found 54 relevant papers. Most of these studies evaluate code readability and legibility by measuring the correctness of the subjects' results (83.3%) or simply asking their opinions (55.6%). Some studies (16.7%) rely exclusively on the latter variable. There are still few studies that monitor subjects' physical signs, such as brain activation regions (5%). Moreover, our study shows that some variables are multi-faceted. For instance, correctness can be measured as the ability to predict the output of a program, answer questions about its behavior, or recall parts of it. These results make it clear that different evaluation approaches require different competencies from subjects, e.g., tracing the program vs. summarizing its goal vs. memorizing its text. To assist researchers in the design of new studies and improve our comprehension of existing ones, we model program comprehension as a learning activity by adapting a preexisting learning taxonomy. This adaptation indicates that some competencies, e.g., tracing, are often exercised in these evaluations whereas others, e.g., relating similar code snippets, are rarely targeted.

The effect of poor source code lexicon and readability on developers' cognitive load

Proceedings of the 26th Conference on Program Comprehension, 2018

It has been well documented that a large portion of the cost of any software lies in the time spent by developers in understanding a program's source code before any changes can be undertaken. One of the main contributors to software comprehension, by subsequent developers or by the authors themselves, has to do with the quality of the lexicon, (i.e., the identifiers and comments) that is used by developers to embed domain concepts and to communicate with their teammates. In fact, previous research shows that there is a positive correlation between the quality of identifiers and the quality of a software project. Results suggest that poor quality lexicon impairs program comprehension and consequently increases the effort that developers must spend to maintain the software. However, we do not yet know or have any empirical evidence, of the relationship between the quality of the lexicon and the cognitive load that developers experience when trying to understand a piece of software. Given the associated costs, there is a critical need to empirically characterize the impact of the quality of the lexicon on developers' ability to comprehend a program. In this study, we explore the effect of poor source code lexicon and readability on developers' cognitive load as measured by a cutting-edge and minimally invasive functional brain imaging technique called functional Near Infrared Spectroscopy (fNIRS). Additionally, while developers perform software comprehension tasks, we map cognitive load data to source code identifiers using an eye tracking device. Our results show that the presence of linguistic antipatterns in source code significantly increases the developers' cognitive load.

Assessing Consensus of Developers' Views on Code Readability

arXiv (Cornell University), 2024

The rapid rise of Large Language Models (LLMs) has changed software development, with tools like Copilot, JetBrains AI Assistant, and others boosting developers' productivity. However, developers now spend more time reviewing code than writing it, highlighting the importance of Code Readability for code comprehension. Our previous research found that existing Code Readability models were inaccurate in representing developers' notions and revealed a low consensus among developers, highlighting a need for further investigations in this field. Building on this, we surveyed 10 Java developers with similar coding experience to evaluate their consensus on Code Readability assessments and related aspects. We found significant agreement among developers on Code Readability evaluations and identified specific code aspects strongly correlated with Code Readability. Overall, our study sheds light on Code Readability within LLM contexts, offering insights into how these models can align with developers' perceptions of Code Readability, enhancing software development in the AI era.

Attributes Influencing the Reading and Comprehension of Source Code – Discussing Contradictory Evidence

CLEI Electronic Journal

Background: Coding guidelines can be contradictory despite their intention of providing a universal perspective on source code quality. For instance, five attributes (code size, semantic complexity, internal documentation, layout style, and identifier length) out of 13 presented contradictions regarding their influence (positive or negative) on the source code readability and comprehensibility. Aims: To investigate source code attributes and their influence on readability and comprehensibility. Method: A literature review was used to identify source code attributes impacting the source code reading and comprehension, and an empirical study was performed to support the assessment of four attributes that presented empirical contradictions in the technical literature. Results: Regardless participants’ experience; all participants showed more positive comprehensibility perceptions for Python snippets with more lines of code. However, their readability perceptions regarding code size wer...

Programming language, natural language? Supporting the diverse computational activities of novice programmers

Journal of Visual Languages & Computing, 2017

Given the current focus on teaching computational concepts to all from an early age, combined with the growing trend to empower end users to become producers of technology rather than mere consumers, we consider the issue of "computational notation". Specifically, where the goal is to help individuals develop their understanding of computation and/or use computation in real world settings, we question whether natural language might be a preferred notation to traditional programming languages, given its familiarity and ubiquity. We describe three empirical studies investigating the use of natural language for computation in which we found that although natural language provides support for understanding computational concepts, it introduces additional difficulties when used for coding. We distilled our findings into a set of design guidelines for novice programming environments which consider the ways in which different notations, including natural language, can best support the various activities that comprise programming. These guidelines were embodied in Flip, a bi-modal programming language used in conjunction with the Electron toolset, which allows young people to create their own commercial quality, narrative based role playing games. Two empirical studies on the use of Flip in three different real world contexts considered the extent to which the design guidelines support ease of use and an understanding of computation. The guidelines have potential to be of use both in analysing the use of natural language in existing novice programming environments, and in the design of new ones.

Assessing the quality factors found in in-line documentation written in natural language: The JavadocMiner

Data & Knowledge Engineering, 2013

Assessing the Quality Factors found in In-line Documentation Written in Natural Language: The JavadocMiner Ninus Khamis An important software engineering artifact used by developers and maintainers to assist in software comprehension and maintenance is source code documentation. It With the utmost gratitude, I acknowledge the guidance, enthusiasm, and inspiration I received from my supervisors, Dr. Juergen Rilling and Dr. René Witte. Their patience and persistence has made it possible for me to excel as a research student, and more importantly, as an individual. I would have been lost without them. Equal thanks go to the many people who have taught me over the years: my high school teachers (especially Mr. Wagner), my undergraduate teachers at Toronto (especially Ravinder Singh, Kanti Akhtar, Dr. Pajkowski, and Denise Simanic), and my graduate teachers (especially Dr. Haarslev). I also take with me many cherished moments and good memories from my colleagues at the Ambient Software Evolution Group (ASEG), as well as the Semantic Software Lab (SSL). Together we created a stimulating and fun environment to work in. Lastly, I would like to express my deepest appreciation to my parent's Shelmon and Vivian, and siblings Raymond and Jessica for their enormous support, infinite patience, and unwavering belief in me. To them I dedicate this thesis.