Spencer Smith - Profile on Academia.edu (original) (raw)

Papers by Spencer Smith

arXiv (Cornell University), Feb 1, 2023

Current software development is often quite code-centric and aimed at short-term deliverables, du... more Current software development is often quite code-centric and aimed at short-term deliverables, due to various contextual forces (such as the need for new revenue streams from many individual buyers). We're interested in software where different forces drive the development. Well understood domains and long-lived software provide one such context. A crucial observation is that software artifacts that are currently handwritten contain considerable duplication. By using domain-specific languages and generative techniques, we can capture the contents of many of the artifacts of such software. Assuming an appropriate codification of domain knowledge, we find that the resulting de-duplicated sources are shorter and closer to the domain. Our prototype, Drasil, indicates improvements to traceability and change management. We're also hopeful that this could lead to long-term productivity improvements for software where these forces are at play. 2012 ACM Subject Classification Software and its engineering → Application specific development environments; Software and its engineering → Requirements analysis; Software and its engineering → Specification languages; Software and its engineering → Automatic programming

arXiv (Cornell University), Nov 26, 2019

We present GOOL, a Generic Object-Oriented Language. It demonstrates that a language, with the ri... more We present GOOL, a Generic Object-Oriented Language. It demonstrates that a language, with the right abstractions, can capture the essence of object-oriented programs. We show how GOOL programs can be used to generate humanreadable, documented and idiomatic source code in multiple languages. Moreover, in GOOL, it is possible to express common programming idioms and patterns, from simple library-level functions, to simple tasks (command-line arguments, list processing, printing), to more complex patterns, such as methods with a mixture of input, output and in-out parameters, and finally Design Patterns (such as Observer, State and Strategy). GOOL is an embedded DSL in Haskell that can generate code in Python, Java, C#, and C++.

arXiv (Cornell University), Oct 21, 2021

To improve software development methods and tools for research software, we first need to underst... more To improve software development methods and tools for research software, we first need to understand the current state of the practice. Therefore, we have developed a methodology for assessing the state of the software development practices for a given research software domain. The methodology is applied to one domain at a time in recognition that software development in different domains is likely to have adopted different best practices. Moreover, providing a means to measure different domains facilitates comparison of development practices between domains. For each domain we wish to answer questions such as: i) What artifacts (documents, code, test cases, etc.) are present? ii) What tools are used? iii) What principles, process and methodologies are used? iv) What are the pain points for developers? v) What actions are used to improve qualities like maintainability and reproducibility? To answer these questions, our methodology prescribes the following steps: i) Identify the domain; ii) Identify a list of candidate software packages; iii) Filter the list to a length of about 30 packages; iv) Gather source code and documentation for each package; v) Collect repository related data on each software package, like number of stars, number of open issues, number of lines of code; vi) Fill in the measurement template (the template consists of 108 questions to assess 9 qualities (including the qualities of installability, usability and visibility)); vii) Interview developers (the interview consists of 20 questions and takes about an hour); viii) Rank the software using the Analytic Hierarchy Process (AHP); and, ix) Analyze the data to answer the questions posed above. A domain expert should be engaged throughout the process, to ensure that implicit information about the domain is properly represented and to assist with conducting an analysis of the commonalities and variabilities between the 30 selected packages. Using our methodology, spreadsheet templates and AHP tool, we estimate (based on our experience with using the process) the time to complete an assessment for a given domain at 173 person hours.

His teaching interests include McMaster's freshman program including the cornerstone design cours... more His teaching interests include McMaster's freshman program including the cornerstone design course. Dr. Doyle is a leading member of the faculty team, enriching and transforming McMaster's curriculum to meet emerging challenges of the profession. His research interests include biomedical signal processing, human-computer interfacing (HCI), brain computer interfacing (BCI), machine learning, and simulation for education. Dr. Doyle earned his PhD at the University of Western Ontario. He is a Professional Engineer in the province of Ontario and a member of the ASEE and IEEE, among other societies.

Software qualities, like maintainability, reproducibility and verifiability, often suffer for Sci... more Software qualities, like maintainability, reproducibility and verifiability, often suffer for Scientific Computing Software (SCS) because of inadequate documentation, traceability, change-enabling design, and testing. Software developers would like to spend more time on documentation, and other software activities, but time and resource pressures frequently make this an unrealistic goal. Ideally, developers should be able to create traceable documentation, design, code, build scripts and tests, without the drudgery of writing and maintaining them. A potential solution is to generate the documentation, code and tests automatically by using Domain Specific Languages (DSLs) over a base of scientific, computing and documentation knowledge. This is the approach that is proposed for a new scientific software development framework called Drasil. By having one source of knowledge, along with rules for transforming and consistency checking, the qualities of completeness, consistency and trac...

Numerical investigation of the reliability of a posteriori error estimation for advection-diffusion equations

Communications in Numerical Methods in Engineering, 2007

ABSTRACT

Document driven certification of computational science and engineering software

Proceedings of the 1st International Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering - SE-HPCCSE '13, 2013

This paper presents a documentation and development methodology to facilitate the certification o... more This paper presents a documentation and development methodology to facilitate the certification of Computational Science and Engineering (CSE) software that is produced by professional end user developers to solve mathematical models of physical systems. To study the problems faced during quality assurance and certification activities, a case study was performed on legacy software used by a nuclear power generating company for safety analysis in a nuclear reactor. Although no errors were uncovered in the code, the documentation still needed significant updating for certification, since its was incomplete and inconsistent. During the case study, 27 issues were found with the documentation. This work proposes improvements to the case study software and other CSE software via a new template for the Software Requirements Specification (SRS) that clearly and sufficiently states the requirements, while satisfying the desired qualities for a good SRS. For developing the design and implementation, this paper suggests Literate Programming (LP) as an alternative to traditional structured programming. Literate Programming documents the numerical algorithms and the logic behind the development and the code together in the same document, the Literate Programmer's Manual (LPM). The LPM is developed in connection with the SRS. The explicit traceability between the theory, numerical algorithms and implementation (code), facilitates completeness and consistency, and simplifies the process of verification and the associated certification.

19th Conference on Software Engineering Education & Training (CSEET'06)

This paper reports on the activities and results from the 2 nd International Workshop on Software... more This paper reports on the activities and results from the 2 nd International Workshop on Software Engineering Course Projects (SWECP 2005), which was held on October 18, 2005 in Toronto, Canada. Creating software engineering course projects for undergraduate students is a challenging task. The instructor must carefully balance the conflicting goals of academic rigor and industrial relevance. Some of the fundamental characteristics of software engineering projects (e.g., team-based, large-scale, long-lived) are difficult to realize within the constraints of a university course in a single semester. This is particularly true when dealing with young students who may lack the real-world experience needed to appreciate some of the more subtle aspects of software engineering. This workshop explored how educators and industry can work together to develop a more rewarding educational experience for all stakeholders involved. Several key themes emerged from the workshop, including the importance of forming teams that are fair and balanced, the challenges in selecting a project that engages the students and meets the goals of the course, and the need for knowledge transfer amongst instructors.

We propose the development of a full-featured, production quality li-brary of validating routines... more We propose the development of a full-featured, production quality li-brary of validating routines for use by the wide community of applications developers. What would it take to develop such a library? How could it be done? 1

Software Quality Grades for Seismology Software

The data provides a summary of the state of development practice for seismology software (as of A... more The data provides a summary of the state of development practice for seismology software (as of August 2017). The summary is based on grading a set of 30 seismology products using a template of 56 questions based on 13 software qualities. The software qualities were further divided into 4 aspects: product, implementation, design and process. The template used to grade the software is found in GradingTemplatedDocument.pdf file. Each quality is measured with a series of questions. For unambiguity the responses are quantified wherever possible (e.g. yes/no answers). The goal is for measures that are visible, measurable and feasible in a short time with limited domain knowledge. Unlike a comprehensive software review, this template does not grade on functionality and features. Therefore, it is possible that a relatively featureless product can outscore a feature-rich product. A virtual machine is used to provide an optimal testing environments for each software product. During the process of grading the 30 software products, it is much easier to create a new virtual machine to test the software, rather than using the host operating system and file system. The raw data obtained by measuring each software product is in SoftwareGrading-Seismology.xlsx. Each line in this file corresponds to between 2 and 4 hours of measurement time by a software engineer. The overall impression scores for each product are summarized in one of the tabs in AHP_seismology.xlsx spreadsheet. These overall impression numbers are then used for a relative comparison between products. The relative comparison is used to populate the AHP tables. Using the mathematics for AHP the numbers are then converted to a ranking of the 30 software products on each of the 13 qualities, and each of the 4 aspects.

Software Quality Grades for GIS Software

The data provides a summary of the state of development practice for Geographic Information Syste... more The data provides a summary of the state of development practice for Geographic Information Systems (GIS) software (as of August 2017). The summary is based on grading a set of 30 GIS products using a template of 56 questions based on 13 software qualities. The products range in scope and purpose from a complete desktop GIS systems, to stand-alone tools, to programming libraries/packages. The template used to grade the software is found in the TabularSummaries.zip file. Each quality is measured with a series of questions. For unambiguity the responses are quantified wherever possible (e.g.~yes/no answers). The goal is for measures that are visible, measurable and feasible in a short time with limited domain knowledge. Unlike a comprehensive software review, this template does not grade on functionality and features. Therefore, it is possible that a relatively featureless product can outscore a feature-rich product. A virtual machine is used to provide an optimal testing environments for each software product. During the process of grading the 30 software products, it is much easier to create a new virtual machine to test the software on, rather than using the host operating system and file system. The raw data obtained by measuring each software product is in SoftwareGrading-GIS.xlsx. Each line in this file corresponds to between 2 and 4 hours of measurement time by a software engineer. The results are summarized for each quality in the TabularSummaries.zip file, as a tex file and compiled pdf file.

Case Studies in Model Manipulation for Scientific Computing

Lecture Notes in Computer Science

The same methodology is used to develop 3 different applications. We begin by using a very expres... more The same methodology is used to develop 3 different applications. We begin by using a very expressive, appropriate Domain Specific Language, to write down precise problem definitions, using their most natural formulation. Once defined, the problems form an implicit definition of a unique solution. From the problem statement, our model, we use mathematical transformations to make the problem simpler to solve computationally. We call this crucial step “model manipulation.” With the model rephrased in more computational terms, we can also derive various quantities directly from this model, which greatly simplify traditional numeric solutions, our eventual goal. From all this data, we then use standard code generation and code transformation techniques to generate lower-level code to perform the final numerical steps. This methodology is very flexible, generates faster code, and generates code that would have been all but impossible for a human programmer to get correct.

2009 ICSE Workshop on Software Engineering for Computational Science and Engineering, 2009

Polymer Engineering & Science, 2000

A model is presented for simulating two-dimensional, nonisothermal film casting of a viscous poly... more A model is presented for simulating two-dimensional, nonisothermal film casting of a viscous polymer. The model accommodates the effects of inertia and gravity, and allows the thickness of the f i l m to vary across the width, but it excludes f i l m sag and die swell. Based on the simulation results, three factors are shown to contribute to reducing neck-in and promoting a uniform thickness: the self-weight of the material, for low viscosity polymers; nonuniform thickness and/or velocity profiles at the die; and cooling of the f i l m , especially when localized cooling jets are employed.

Lecture Notes in Computer Science

We present an implementation of double precision interval arithmetic using the single-instruction... more We present an implementation of double precision interval arithmetic using the single-instruction-multiple-data SSE-2 instruction and register set extensions. The implementation is part of a package for exact real arithmetic, which defines the interval arithmetic variation that must be used: incorrect operations such as division by zero cause exceptions, loose evaluation of the operations is in effect, and performance is more important than tightness of the produced bounds. The SSE2 extensions are suitable for the job, because they can be used to operate on a pair of double precision numbers and include separate rounding mode control and detection of the exceptional conditions. The paper describes the ideas we use to fit interval arithmetic to this set of instructions, shows a performance comparison with other freely available interval arithmetic packages, and discusses possible very simple hardware extensions that can significantly increase the performance of interval arithmetic.

2nd CASCON Workshop on Software Engineering for Science

Proceedings of the 2009 Conference of the Center for Advanced Studies on Collaborative Research - CASCON '09, 2009

Scientific software is application software that supplies data to support decisions in a field of... more Scientific software is application software that supplies data to support decisions in a field of science or engineering. Computational models of climate, geographic information systems to study bird habitats, analysis software for the safe operation of nuclear generating stations and software to study stresses on concrete structures are only a few of the thousands of examples. This scientific software is largely written by scientists, not software specialists. In some organizations, there may be a team of scientists and software specialists, but the complexity of the science requires the participation of the scientist, often as the software developer.

GOOL: a generic object-oriented language

Proceedings of the 2020 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation

We present GOOL, a Generic Object-Oriented Language. GOOL shows that with the right abstractions,... more We present GOOL, a Generic Object-Oriented Language. GOOL shows that with the right abstractions, a language can capture the essence of object-oriented programs. GOOL generates human-readable, documented and idiomatic code in Python, Java, C#, and C++. In it, we can express common programming idioms and patterns.

ArXiv, 2021

Our goal is to identify inhibitors and catalysts for productive longterm scientific software deve... more Our goal is to identify inhibitors and catalysts for productive longterm scientific software development. The inhibitors and catalysts could take the form of processes, tools, techniques, environmental factors (like working conditions) and software artifacts (such as user manuals, unit tests, design documents and code). The effort (time) invested in catalysts will pay off in the long-term, while inhibitors will take up resources, and can lower product quality. Developer surveys on inhibitors and catalystswill yield responses as varied as the education and experiential backgrounds of the respondents. Although well-meaning, responses will predictably be biased. For instance, developers may be guilty of the sunk cost fallacy, promoting a technology they have invested considerable hours in learning, even if the current costs outweigh the benefits. Likewise developers may recommend against spending time on proper requirements, not as an indication that requirements are not valuable, only...

State of Sustainability for Research Software, SIAM CSE21, Software Productivity and Sustainability for CSE

We wish to understand the sustainability of current research software development practices. Sust... more We wish to understand the sustainability of current research software development practices. Sustainability is here defined to mean production of software that meets the (explicit and implicit) requirements of the present, while also allowing for cost-effective modifications in the future. To assess current sustainability, we will:<br> i) analyze existing open source projects to determine what documents, code and scripts are produced; ii) what concepts are captured by those artifacts; iii) analyze textbooks and other authoritative sources to determine what documents, code and scripts are recommended;iv) determine what concepts are supposed to be captured by those artifacts. We will conduct some case studies on specific domains (such as medical imaging software and Lattice Boltzmann Solvers) via ranking representative software in pair-wise comparisons along multiple quality measures. For the top software thus identified, we will conduct simple experiments to assess usability an...

Dagstuhl Publications, 2008

It took us some time to find this out.

arXiv (Cornell University), Feb 1, 2023

arXiv (Cornell University), Nov 26, 2019

arXiv (Cornell University), Oct 21, 2021

Numerical investigation of the reliability of a posteriori error estimation for advection-diffusion equations

Communications in Numerical Methods in Engineering, 2007

ABSTRACT

Document driven certification of computational science and engineering software

Proceedings of the 1st International Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering - SE-HPCCSE '13, 2013

19th Conference on Software Engineering Education & Training (CSEET'06)

Software Quality Grades for Seismology Software

Software Quality Grades for GIS Software

Case Studies in Model Manipulation for Scientific Computing

Lecture Notes in Computer Science

2009 ICSE Workshop on Software Engineering for Computational Science and Engineering, 2009

Polymer Engineering & Science, 2000

Lecture Notes in Computer Science

2nd CASCON Workshop on Software Engineering for Science

Proceedings of the 2009 Conference of the Center for Advanced Studies on Collaborative Research - CASCON '09, 2009

GOOL: a generic object-oriented language

Proceedings of the 2020 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation

ArXiv, 2021

State of Sustainability for Research Software, SIAM CSE21, Software Productivity and Sustainability for CSE

Dagstuhl Publications, 2008

It took us some time to find this out.