Per Runeson | Lund University (original) (raw)

Conference Papers by Per Runeson

Duplicate detection is a fundamental part of issue management. Systems able to predict whether a ... more Duplicate detection is a fundamental part of issue management. Systems able to predict whether a new defect report will be closed as a duplicate, may decrease costs by limiting rework and collecting related pieces of information. Previous work relies on the textual content of the defect reports, often assuming that better results are obtained if the title is weighted as more important than the descrip- tion. We conduct a conceptual replication of a well-cited study conducted at Sony Ericsson, using Apache Lucene for searching in the public Android defect repository. In line with the original study, we explore how varying the weighting of the title and the description affects the accuracy. Our work shows the poten- tial of using Lucene as a scalable solution for duplicate detection. Also, we show that Lucene obtains the best results the when the defect report title is weighted three times higher than the description, a bigger difference than has been previously acknowledged.

Background. Test automation is a widely used technique to increase the efficiency of software tes... more Background. Test automation is a widely used technique to increase the efficiency of software testing. However, executing more test cases increases the effort required to analyze test results. At Qlik, automated tests run nightly for up to 20 development branches, each containing thousands of test cases, resulting in information overload. Aim. We therefore develop a tool that supports the analy- sis of test results. Method. We create NIOCAT, a tool that clusters similar test case fail- ures, to help the analyst identify underlying causes. To evaluate the tool, experiments on manually created subsets of failed test cases representing different use cases are conducted, and a focus group meeting is held with test analysts at Qlik. Results. The case study shows that NIOCAT creates accurate clusters, in line with analyses performed by human analysts. Further, the potential time-savings of our approach is confirmed by the participants in the focus group. Conclusions. NIOCAT provides a feasible complement to current au-tomated testing practices at Qlik by reducing information overload.

Regression testing practices in industry have to be better understood, both for the industry itse... more Regression testing practices in industry have to be better understood, both for the industry itself and for the research community. Method : We conducted a qualitative industry survey by i) running a focus group meeting with 15 industry participants and ii) validating the outcome in an on line questionnaire with 32 respondents. Results: Regression testing needs and practices vary greatly between and within organizations and at different stages of a project. The importance and challenges of automation is clear from the survey. Conclusions: Most of the findings are general testing issues and are not specific to regression testing. Challenges and good practices relate to test automation and testability issues.

Open Innovation (OI) has gained significant attention since the term was introduced in 2003. Howe... more Open Innovation (OI) has gained significant attention since the term was introduced in 2003. However, little is known whether general software testing processes are well suited for OI. An exploratory case study on the Acceptance Test Harness (ATH) is conducted to investigate OI testing activities of Jenkins. As far as the research methodology is concerned, we extracted the change log data of ATH followed by five interviews with key contributors in the development of ATH. The findings of the study are threefold. First, it highlights the key stakeholders involved in the development of ATH. Second, the study compares the ATH testing activities with ISO/IEC/IEEE testing process and presents a tailored process for software testing in OI. Finally, the study underlines some key challenges that software intensive organizations face while working with the testing in OI.

Proc. of the 8th International Symposium on Empirical Software Engineering and Measurement, Sep 18, 2014

Context: Duplicate detection is a fundamental part of issue management. Systems able to predict w... more Context: Duplicate detection is a fundamental part of issue management. Systems able to predict whether a new defect report will be closed as a duplicate, may decrease costs by limiting rework and collecting related pieces of information. Previous work relies on the textual content of the defect reports, often assuming that better results are obtained if the title is weighted as more important than the description. Method: We conduct a conceptual replication of a well-cited study conducted at Sony Ericsson, using
Apache Lucene for searching in the public Android defect repository. In line with the original study, we explore how varying the weighting of the title and the description affects the accuracy. Results and conclusions: Our work shows the potential of using Lucene as a scalable solution for duplicate detection. Also, we show that Lucene obtains the best results the when the defect report title is weighted three times higher than the description, a bigger differencethan has been previously acknowledged.

Background: Development of complex, software intensive systems generates large amounts of inform... more Background: Development of complex, software intensive
systems generates large amounts of information. Several
researchers have developed tools implementing information
retrieval (IR) approaches to suggest traceability links among
artifacts. Aim: We explore the consequences of the fact that
a majority of the evaluations of such tools have been focused
on benchmarking of mere tool output. Method: To illustrate this
issue, we have adapted a framework of general IR evaluations to a context taxonomy specifically for IR-based traceability recovery. Furthermore, we evaluate a previously proposed experimental framework by conducting a study using two publicly available tools on two datasets originating from development of embedded software systems. Results: Our study shows that even though both datasets contain software artifacts from embedded development, the characteristics of the two datasets differ considerably, and consequently the traceability outcomes. Conclusions: To enable replications and secondary studies, we suggest that datasets should be thoroughly characterized in future studies on traceability
recovery, especially when they can not be disclosed. Also, while
we conclude that the experimental framework provides useful
support, we argue that our proposed context taxonomy is a useful complement. Finally, we discuss how empirical evidence of the feasibility of IR-based traceability recovery can be strengthened in future research.

Proc. of the 17th European Conference on Software Maintenance and Reengineering, Mar 6, 2013

Completely analyzed and closed issue reports in software development projects, particularly in th... more Completely analyzed and closed issue reports in software development projects, particularly in the development of safety-critical systems, often carry important information about issue-related change locations. These locations may be in the source code, as well as traces to test cases affected by the issue, and related design and requirements documents. In order to help developers analyze new issues, knowledge about issue clones and duplicates, as well as other relations between the new issue and existing issue reports would be useful. This paper analyses, in an exploratory study, issue reports contained in two Issue Management Systems (IMS) containing approximately 20.000 issue reports. The purpose of the analysis is to gain a better understanding of relationships between issue reports
in IMSs. We found that link-mining explicit references can reveal complex networks of issue reports. Furthermore, we found that textual similarity analysis might have the potential to complement the explicitly signaled links by recommending additional relations. In line with work in other fields, links between software artifacts have a potential to improve search and navigation in large software engineering projects.

Proceedings of the 7th International Conference on Software Testing, Verification and Validation, Mar 31, 2014

Background: Test managers have to repeatedly select test cases for test activities during evoluti... more Background: Test managers have to repeatedly select test cases for test activities during evolution of large software systems. Researchers have widely studied automated test scoping, but have not fully investigated decision support with human interaction. We previously proposed the introduction of visual analytics for this purpose. Aim: In this empirical study we investigate how to design such decision support. Method: We
explored the use of visual analytics using heat maps of historical
test data for test scoping support by letting test managers
evaluate prototype visualizations in three focus groups with in
total nine industrial test experts. Results: All test managers in
the study found the visual analytics useful for supporting test
planning. However, our results show that different tasks and
contexts require different types of visualizations. Conclusion:
Important properties for test planning support are: ability to
overview testing from different perspectives, ability to filter and
zoom to compare subsets of the testing with respect to various
attributes and the ability to manipulate the subset under analysis
by selecting and deselecting test cases. Our results may be used
to support the introduction of visual test analytics in practice.

Proc. of the 7th International Symposium on Empirical Software Engineering and Measurement

Several researchers have proposed creating after-the-fact structure among software artifacts usin... more Several researchers have proposed creating after-the-fact structure among software artifacts using trace recovery based on Information Retrieval (IR) approaches. Due to significant variation points in previous studies, results are not easily aggregated. We provide an initial overview picture of the outcome of previous evaluations. Based on a systematic mapping study, we perform a synthesis of published research. Our results show that there are no empirical evidence that any IR model outperforms another model consistently. We also display a strong dependency between the P-R values and the input datasets. Finally, our mapping of Precision and Recall (P-R) values on the possible output space highlights the difficulty of recovering accurate trace links using naïve cut-off strategies. Thus, our work presents empirical evidence that confirms several previous claims on IR-based trace recovery and stresses the needs for empirical evaluations beyond the basic P-R "race".

Background. Test automation is a widely used technique to increase the efficiency of software tes... more Background. Test automation is a widely used technique
to increase the efficiency of software testing. However,
executing more test cases increases the effort required to analyze test results. At Qlik, automated tests run nightly for up to 20 development branches, each containing thousands of test cases, resulting in information overload. Aim. We therefore develop a tool that supports the analysis of test results. Method. We create NIOCAT, a tool that clusters similar test case failures, to help the analyst identify underlying causes. To evaluate the tool, experiments on manually created subsets of failed test cases representing different use cases are conducted, and a focus group meeting is held with test analysts at Qlik. Results. The case study shows that NIOCAT creates accurate clusters, in line with analyses performed by human analysts. Further, the potential time-savings of our approach is confirmed by the participants in the focus group. Conclusions. NIOCAT provides a feasible complement to current automated testing practices at Qlik by reducing information overload.

Papers by Per Runeson

Software Quality Journal, Dec 3, 2022

Empirical Software Engineering, Aug 18, 2013

Background. In formal experiments on software engineering, the number of factors that may impact ... more Background. In formal experiments on software engineering, the number of factors that may impact an outcome is very high. Some factors are controlled and change by design, while others are are either unforeseen or due to chance. Aims. This paper aims to explore how context factors change in a series of formal experiments and to identify implications for experimentation and replication practices to enable learning from experimentation. Method. We analyze three experiments on code inspections and structural unit testing. The first two experiments use the same experimental design and instrumentation (replication), while the third, conducted by different researchers, replaces the programs and adapts defect detection methods accordingly (reproduction). Experimental procedures and location also differ between the experiments. Results. Contrary to expectations, there are significant differences between the original experiment and the replication, as well as compared to the reproduction. Some of the differences are due to factors other than the ones designed to vary between experiments, indicating the sensitivity to context factors in software engineering experimentation. Conclusions. In aggregate, the analysis indicates that reducing the complexity of software engineering experiments should be considered by researchers who want to obtain reliable and repeatable empirical measures.

Empirical Software Engineering, Mar 17, 2006

Integrating agile software development into stage-gate managed product development Karlström, Dan... more Integrating agile software development into stage-gate managed product development Karlström, Daniel; Runeson, Per

Helgesson, Daniel 2021 Link to publication Citation for published version (APA): Helgesson, D. (2... more Helgesson, Daniel 2021 Link to publication Citation for published version (APA): Helgesson, D. (2021). Exploring grounded theory perspectives of cognitive load in software engineering. Lund University.

arXiv (Cornell University), Mar 12, 2021

Testing of autonomous systems is extremely important as many of them are both safety-critical and... more Testing of autonomous systems is extremely important as many of them are both safety-critical and security-critical. The architecture and mechanism of such systems are fundamentally different from traditional control software, which appears to operate in more structured environments and are explicitly instructed according to the system design and implementation. To gain a better understanding of autonomous systems practice and facilitate research on testing of such systems, we conducted an exploratory study by synthesizing academic literature with a focus group discussion and interviews with industry practitioners. Based on thematic analysis of the data, we provide a conceptualization of autonomous systems, classifications of challenges and current practices as well as of available techniques and approaches for testing of autonomous systems. Our findings also indicate that more research efforts are required for testing of autonomous systems to improve both the quality and safety aspects of such systems.

Testing of autonomous vehicles involves enormous challenges for the automotive industry. The numb... more Testing of autonomous vehicles involves enormous challenges for the automotive industry. The number of real-world driving scenarios is extremely large, and choosing effective test scenarios is essential, as well as combining simulated and real world testing. We present an industrial workbench of tools and workflows to generate efficient and effective test scenarios for active safety and autonomous driving functions. The workbench is based on existing engineering tools, and helps smoothly integrate simulated testing, with real vehicle parameters and software. We aim to validate the workbench with real cases and further refine the input model parameters and distributions.

The Kluwer international series in software engineering, 2000

The use of general descriptive names, registered names, trademarks, service marks, etc. in this p... more The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.

Motivation. Digital commons is an emerging phenomenon and of increasing importance, as we enter a... more Motivation. Digital commons is an emerging phenomenon and of increasing importance, as we enter a digital society. Open data is one example that makes up a pivotal input and foundation for many of today's digital services and applications. Ensuring sustainable provisioning and maintenance of the data, therefore, becomes even more important. Aim. We aim to investigate how such provisioning and maintenance can be collaboratively performed in the community surrounding a common. Specifically, we look at Open Data Ecosystems (ODEs), a type of community of actors, openly sharing and evolving data on a technological platform. Method. We use Elinor Ostrom's design principles for Common Pool Resources as a lens to systematically analyze the governance of earlier reported cases of ODEs using a theory-oriented software engineering framework. Results. We find that, while natural commons must regulate consumption, digital commons such as open data maintained by an ODE must stimulate both use and data provisioning. Governance needs to enable such stimulus while also ensuring that the collective action can still be coordinated and managed within the frame of available maintenance resources of a community. Subtractability is, in this sense, a concern regarding the resources required to maintain the quality and value of the data, rather than the availability of data. Further, we derive empirically-based recommended practices for ODEs based on the design principles by Ostrom for how to design a governance structure in a way that enables a sustainable and collaborative provisioning and maintenance of the data. Conclusion. ODEs are expected to play a role in data provisioning which democratize the digital society and enables innovation from smaller commercial actors. Our empirically based guidelines intend to support this development. CCS Concepts: • Software and its engineering → Open source model.

Proc. of the 8th International Symposium on Empirical Software Engineering and Measurement, Sep 18, 2014

Proc. of the 17th European Conference on Software Maintenance and Reengineering, Mar 6, 2013

Proceedings of the 7th International Conference on Software Testing, Verification and Validation, Mar 31, 2014

Proc. of the 7th International Symposium on Empirical Software Engineering and Measurement

Background. Test automation is a widely used technique to increase the efficiency of software tes... more Background. Test automation is a widely used technique
to increase the efficiency of software testing. However,
executing more test cases increases the effort required to analyze test results. At Qlik, automated tests run nightly for up to 20 development branches, each containing thousands of test cases, resulting in information overload. Aim. We therefore develop a tool that supports the analysis of test results. Method. We create NIOCAT, a tool that clusters similar test case failures, to help the analyst identify underlying causes. To evaluate the tool, experiments on manually created subsets of failed test cases representing different use cases are conducted, and a focus group meeting is held with test analysts at Qlik. Results. The case study shows that NIOCAT creates accurate clusters, in line with analyses performed by human analysts. Further, the potential time-savings of our approach is confirmed by the participants in the focus group. Conclusions. NIOCAT provides a feasible complement to current automated testing practices at Qlik by reducing information overload.

Software Quality Journal, Dec 3, 2022

Empirical Software Engineering, Aug 18, 2013

Empirical Software Engineering, Mar 17, 2006

arXiv (Cornell University), Mar 12, 2021

The Kluwer international series in software engineering, 2000

Software Quality Journal, Oct 5, 2014

Lecture Notes in Computer Science, 2020

In industry-academia collaboration projects, there are many issues related to different time hori... more In industry-academia collaboration projects, there are many issues related to different time horizons in industry and academia. If not adressed upfront, they may hinder collaboration in such projects. We analyze our experiences from a 10 year industry-academia collaboration program, the EASE Industrial Excellence Center in Sweden, and identify issues and feasible practices to overcome the hurdles of different time horizons. Specifically, we identify issues related to contracts, goals, results, organization (in)stability, and work practices. We identify several areas where the time horizon is different, and conclude that mutual awareness of these differences and management commitment to the collaboration are the key means to overcome the differences. The launch of a mediating institute may also be part of the solution.

arXiv (Cornell University), Oct 11, 2022

Context: Continuous experimentation (CE) is used by many companies with internet-facing products ... more Context: Continuous experimentation (CE) is used by many companies with internet-facing products to improve their software based on user data. Some companies deliberately adopt an experiment-driven approach to software development while some companies use CE in a more ad-hoc fashion. Objective: The goal of the study is to identify factors that explain the variations in the utility and efficacy of CE between different companies. Method: We conducted a multicase study of 12 companies involved with CE and performed 27 interviewees with practitioners at these companies. Based on that empirical data, we then built a theory of factors at play in CE. Results: We introduce a theory of Factors Affecting Continuous Experimentation (FACE). The theory includes three factors, namely 1) processes and infrastructure for CE, 2) the user problem complexity of the product offering, and 3) incentive structures for CE. It explains how these factors affect the effectiveness of CE and its ability to achieve problem-solution and product-market fit. Conclusions: Our theory can be used by practitioners to assess an organisation's potential for adopting CE, as well as, identifying factors which pose challenges in gaining value from CE practices. Our results also provide a starting point for further research on how contextual factors affect CE and how these may be mitigated.

A methodology for developing software intensive systems denoted Cleanroom Software Engineering is... more A methodology for developing software intensive systems denoted Cleanroom Software Engineering is presented. The methodology has been developed at IBM and Software Engineering Technology (SET) in the USA, and is currently being adapted and applied to the field of telecommunications by Q-Labs.The main objective of Cleanroom is to introduce a set of management and engineering techniques which shall form a sound basis for developing zero defect software. The objective of the paper is to give an overview to how Cleanroom can be used and adapted to provide a comprehensive and manageable software engineering process to develop dependable softwaresystems.The emphasis in the paper is on the work made to adapt the development methodology to telecommunications. The adaptations consist of two main areas, i.e. a development method and a certification method. The objective of the development method is to capture several different aspects of the system at an early stage by using different descrip...

Empirical Software Engineering, 2018

Springer eBooks, 2013

Context. Development of safety-critical systems is mostly governed by process-heavy paradigms, wh... more Context. Development of safety-critical systems is mostly governed by process-heavy paradigms, while increasing demands on flexibility and agility also reach this domain. Objectives. We wanted to explore in more detail the industrial needs and challenges when facing this trend. Method. We launched a qualitative survey, interviewing engineers from four companies in four different industry domains. Results. The survey identifies human factors (skills, experience, and attitudes) being key in safety-critical systems development, as well as good documentation. Certification cost is related to change frequency, which is limiting flexibility. Component reuse and iterative processes were found to increase adaptability to changing customer needs. Conclusions. We conclude that agile development and flexibility may co-exist with safetycritical software development, although there are specific challenges to address.

arXiv (Cornell University), Mar 6, 2017

Change Impact Analysis (CIA) during software evolution of safety-critical systems is a labor-inte... more Change Impact Analysis (CIA) during software evolution of safety-critical systems is a labor-intensive task. Several authors have proposed tool support for CIA, but very few tools were evaluated in industry. We present a case study on ImpRec, a recommendation System for Software Engineering (RSSE), tailored for CIA at a process automation company. ImpRec builds on assisted tracing, using information retrieval solutions and mining software repositories to recommend development artifacts, potentially impacted when resolving incoming issue reports. In contrast to the majority of tools for automated CIA, ImpRec explicitly targets development artifacts that are not source code. We evaluate ImpRec in a two-phase study. First, we measure the correctness of ImpRec's recommendations by a simulation based on 12 years' worth of issue reports in the company. Second, we assess the utility of working with ImpRec by deploying the RSSE in two development teams on different continents. The results suggest that ImpRec presents about 40% of the true impact among the top-10 recommendations. Furthermore, user log analysis indicates that ImpRec can support CIA in industry, and developers acknowledge the value of ImpRec in interviews. In conclusion, our findings show the potential of reusing traceability associated with developers' past activities in an RSSE.

Bug report assignment is an important part of software maintenance. In particular, incorrect assi... more Bug report assignment is an important part of software maintenance. In particular, incorrect assignments of bug reports to development teams can be very expensive in large software development projects. Several studies propose automating bug assignment techniques using machine learning in open source software contexts, but no study exists for large-scale proprietary projects in industry. The goal of this study is to evaluate automated bug assignment techniques that are based on machine learning classification. In particular, we study the state-of-the-art ensemble learner Stacked Generalization (SG) that combines several classifiers. We collect more than 50,000 bug reports from five development projects from two companies in different domains. We implement automated bug assignment and evaluate the performance in a set of controlled experiments. We show that SG scales to large scale industrial application and that it outperforms the use of individual classifiers for bug assignment, reaching prediction accuracies from 50 % to 89 % when large training sets are used. In addition, we show how old training data can decrease the prediction accuracy of bug assignment. We advice industry to use SG for bug assignment in proprietary contexts, using at least 2,000 bug reports for training. Finally, we highlight the importance of not solely relying on results from cross-validation when evaluating automated bug assignment.

Empirical Software Engineering, 2014

Weak alignment of requirements engineering (RE) with verification and validation (VV) may lead to... more Weak alignment of requirements engineering (RE) with verification and validation (VV) may lead to problems in delivering the required products in time with the right quality. For example, weak communication of requirements changes to testers may result in
lack of verification of new requirements and incorrect verification of old invalid requirements, leading to software quality problems, wasted effort and delays. However, despite the serious implications of weak alignment research and practice both tend to focus on one or the other of RE or VV rather than on the alignment of the two.We have performed a multi-unit case study to gain
insight into issues around aligning RE and VV by interviewing 30 practitioners from 6 software developing companies, involving 10 researchers in a flexible research process for case studies.
The results describe current industry challenges and practices in aligning RE with VV, ranging from quality of the individual RE and VVactivities, through tracing and tools, to change control and sharing a common understanding at strategy, goal and design level. The study identified that human aspects are central, i.e. cooperation and communication, and that requirements engineering practices are a critical basis for alignment. Further, the size of an organisation and its motivation for applying alignment practices, e.g. external enforcement of traceability, are
variation factors that play a key role in achieving alignment. Our results provide a strategic roadmap for practitioners improvement work to address alignment challenges. Furthermore, the
study provides a foundation for continued research to improve the alignment of RE with VV.

Empirical Software Engineering, 2014

Engineers in large-scale software development have to manage large amounts of information, spread... more Engineers in large-scale software development have to manage large amounts of information, spread across many artifacts. Several researchers have proposed expressing retrieval of trace links among artifacts, i.e. trace recovery, as an Information Retrieval (IR) problem. The objective of this study is to produce a map of work on IR-based trace recovery, with a particular focus on previous evaluations and strength of evidence. We conducted a systematic mapping of IR-based trace recovery. Of the 79 publications classified, a majority applied algebraic IR models. While a set of studies on students indicate that IR-based trace recovery tools support certain work tasks, most previous studies do not go beyond reporting precision and recall of candidate trace links from evaluations using datasets containing less than 500 artifacts. Our review identified a need of industrial case studies. Furthermore, we conclude that the overall quality of reporting should be improved regarding both context and tool details, measures reported, and use of IR terminology. Finally, based on our empirical findings, we present suggestions on how to advance research on IR-based trace recovery.