Code Coverage for Suite Evaluation by Developers (original) (raw)

Using mutation analysis for assessing and comparing testing coverage criteria

IEEE Transactions on Software Engineering, 2006

The empirical assessment of test techniques plays an important role in software testing research. One common practice is to instrument faults in subject software, either manually or by using a program that generates all possible mutants based on a set of mutation operators. The latter allows the systematic, repeatable seeding of large numbers of faults, thus facilitating the statistical analysis of fault detection effectiveness of test suites; however, we do not know whether empirical results obtained this way lead to valid, representative conclusions. Focusing on four common control and data flow criteria (Block, Decision, C-use, P-use), this paper investigates this important issue based on a middle size industrial program with a comprehensive pool of test cases and known faults. Based on the data available thus far, the results are very consistent across the investigated criteria, as they show that the use of mutation operators is yielding trustworthy results: generated mutants can be used to predict the detection effectiveness of real faults. Applying such mutation analysis, we then investigate the relative cost and effectiveness of the above mentioned criteria by revisiting fundamental questions regarding the relationships between fault detection, test suite size, and control/data flow coverage. Although such questions have been partially investigated in previous studies, we can use a large number of mutants which helps decrease the impact of random variation in our analysis and allows us to use a different analysis approach. Our results are then compared with published studies, plausible reasons for the differences are provided, and this leads us to suggest a way to tune the mutation analysis process to possible differences in fault detection probabilities in a specific environment.

Quality Metrics of Test Suites in Testdriven Designed Applications

International Journal of Software Engineering & Applications, 2018

New techniques for writing and developing software have evolved in recent years. One is Test-Driven Development (TDD) in which tests are written before code. No code should be written without first having a test to execute it. Thus, in terms of code coverage, the quality of test suites written using TDD should be high. In this work, we analyze applications written using TDD and traditional techniques. Specifically, we demonstrate the quality of the associated test suites based on two quality metrics: 1) structure-based criterion, 2) fault-based criterion. We learn that test suites with high branch test coverage will also have high mutation scores, and we especially reveal this in the case of TDD applications. We found that Test-Driven Development is an effective approach that improves the quality of the test suite to cover more of the source code and also to reveal more.

On guiding the augmentation of an automated test suite via mutation analysis

EMPIRICAL SOFTWARE ENGINEERING, 2009

Mutation testing has traditionally been used as a defect injection technique to assess the effectiveness of a test suite as represented by a "mutation score." Recently, mutation test tools have become more efficient, and research in mutation analysis is experiencing growth. Mutation analysis entails adding or modifying test cases until the test suite is sufficient to detect as many mutants as possible and the mutation score is satisfactory. The augmented test suite resulting from mutation analysis may reveal latent faults and provides a stronger test suite to detect future errors which might be injected. Software engineers often look for guidance on how to augment their test suite using information provided by statement and/or branch coverage tools. As the use of mutation analysis grows, software engineers will want to know how the emerging technique compares with and/or complements coverage analysis for guiding the augmentation of an automated test suite. Additionally, software engineers can benefit from an enhanced understanding of efficient mutation analysis techniques. To address this need for additional information, we conducted an empirical study of the use of mutation analysis on two open source projects. Our results indicate that a focused effort on increasing mutation score leads to a corresponding increase in statement and branch coverage to the point that all three measures reach a maximum but leave some types of code structures uncovered.

Comparing non-adequate test suites using coverage criteria

Proceedings of the 2013 International Symposium on Software Testing and Analysis, 2013

A fundamental question in software testing research is how to compare test suites, often as a means for comparing testgeneration techniques. Researchers frequently compare test suites by measuring their coverage. A coverage criterion C provides a set of test requirements and measures how many requirements a given suite satisfies. A suite that satisfies 100% of the (feasible) requirements is C-adequate. Previous rigorous evaluations of coverage criteria mostly focused on such adequate test suites: given criteria C and C ′ , are C-adequate suites (on average) more effective than C ′adequate suites? However, in many realistic cases producing adequate suites is impractical or even impossible. We present the first extensive study that evaluates coverage criteria for the common case of non-adequate test suites: given criteria C and C ′ , which one is better to use to compare test suites? Namely, if suites T1, T2. .. Tn have coverage values c1, c2. .. cn for C and c ′ 1 , c ′ 2. .. c ′ n for C ′ , is it better to compare suites based on c1, c2. .. cn or based on c ′ 1 , c ′ 2. .. c ′ n ? We evaluate a large set of plausible criteria, including statement and branch coverage, as well as stronger criteria used in recent studies. Two criteria perform best: branch coverage and an intra-procedural acyclic path coverage.

Intelligent evaluation of test suites for developing efficient and reliable software

International Journal of Parallel, Emergent and Distributed Systems, 2019

Test suites play an important role in developing reliable software applications. Generally, the behaviour of software applications is verified by executing test suites to find defects. The quality of a test suite needs to be evaluated and enriched (if needed) especially for testing critical systems, such as plane-navigation system. This paper presents a novel method for comparing concrete and executable test suites using equivalence classes. This comparison identifies gaps in test suites with respect to each other. These gaps indicate potential weaknesses in the test suites. Furthermore, this method provides a mechanism to enrich the test suites using these gaps. In this method, we devise equivalence classes, and associate each test case to an equivalence class. We, then, simulate the comparison of test suites by comparing sets of equivalence classes. The method compares test suites in a platform independent manner. The test suites, which are compared, are smaller than the original test suites because the redundant test cases are removed from the test suites, which makes it efficient. We exercise our method over three case studies to demonstrate its viability and effectiveness. The first case study illustrates the application of the method and evaluates its effectiveness using a mutation analysis. The second case study evaluates its effectiveness using mutation and coverage analyses. The final case study evaluates it on a real case study, which is Lucene search engine.

Open Code Coverage Framework: A Framework for Consistent, Flexible and Complete Measurement of Test Coverage Supporting Multiple Programming Languages

IEICE Transactions on Information and Systems, 2011

Kazunori SAKAMOTO †a) , Fuyuki ISHIKAWA † †b) , Hironori WASHIZAKI †c) , and Yoshiaki FUKAZAWA †d) , Members SUMMARY Test coverage is an important indicator of whether software has been sufficiently tested. However, there are several problems with the existing measurement tools for test coverage, such as their cost of development and maintenance, inconsistency, and inflexibility in measurement. We propose a consistent and flexible measurement framework for test coverage that we call the Open Code Coverage Framework (OCCF). It supports multiple programming languages by extracting the commonalities from multiple programming languages using an abstract syntax tree to help in the development of the measurement tools for the test coverage of new programming languages. OCCF allows users to add programming language support independently of the test-coverage-criteria and also to add test-coverage-criteria support independently of programming languages in order to take consistent measurements in each programming language. Moreover, OCCF provides two methods for changin the measurement range and elements using XPath and adding user code in order to make more flexible measurements. We implemented a sample tool for C, Java, and Python using OCCF. OCCF can measure four test-coveragecriteria. We also confirmed that OCCF can support C#, Ruby, JavaScript, and Lua. Moreover, we reduced the lines of code (LOCs) required to implement measurement tools for test coverage by approximately 90% and the time to implement a new test-coverage-criterion by over 80% in an experiment that compared OCCF with the conventional non-framework-based tools.

A Comparative Evaluation of Test Coverage Techniques Effectiveness

Journal of Software Engineering and Applications

Software systems have become complex and challenging to develop and maintain because of the large size of test cases with increased scalability issues. Test case prioritization methods have been successfully utilized in test case management. However, the prohibitively exorbitant cost of large test cases is now the mainstream in the software industry. The growth of agile test-driven development has increased the expectations for software quality. Yet, our knowledge of when to use various path testing criteria for costeffectiveness is inadequate due to the inherent complexity in software testing. Existing researches attempted to address the issue without effectively tackling the scalability of large test suites to reduce time in regression testing. In order to provide a more accurate way of fault detection in software projects, we introduced novel coverage criteria, called Incremental Cluster-based test case Prioritization (ICP), and investigated its potentials by making a comparative evaluation with three un-clustered traditional coverage-based criteria: Prime-Path Coverage (PPC), Edge-Pair Coverage (EPC) and Edge Coverage (EC) based on mutation analysis. By clustering test suites, based on their dynamic run-time behavior, the number of pair-wise comparisons is reduced significantly. To compare, we analyzed 20 functions from 25 C programs, instrumented faults into the programs, and used the Mull mutation tool to generate mutants and perform a statistical analysis of the results. The experimental results show that ICP can lead to cost-effective improvements in fault detection.

An Empirical Analysis of the Correlation between CK Metrics, Test Coverage and Mutation Score

Proceedings of the 19th International Conference on Enterprise Information Systems

In this paper we investigate the correlation between test coverage, mutation score and object-oriented systems metrics. First we conducted a literature review to obtain an initial model of testability and existing object-oriented metrics related to testability. Thus we selected four open source system whose test cases were available and calculated the correlation between the metrics collected and the line coverage, branches coverage and mutation score. Preliminary results show that some CK metrics, which are strongly related to system's design, influence mainly line coverage and mutation score, thus they can influence systems testability.

Impact of Test Suite Coverage on Overfitting in Genetic Improvement of Software

Search-Based Software Engineering, 2020

Genetic Improvement (GI) uses automated search to improve existing software. It can be used to improve runtime, energy consumption, fix bugs, and any other software property, provided that such property can be encoded into a fitness function. GI usually relies on testing to check whether the changes disrupt the intended functionality of the software, which makes test suites important artefacts for the overall success of GI. The objective of this work is to establish which characteristics of the test suites correlate with the effectiveness of GI. We hypothesise that different test suite properties may have different levels of correlation to the ratio between overfitting and non-overfitting patches generated by the GI algorithm. In order to test our hypothesis, we perform a set of experiments with automatically generated test suites using EvoSuite and 4 popular coverage criteria. We used these test suites as input to a GI process and collected the patches generated throughout such a process. We find that while test suite coverage has an impact on the ability of GI to produce correct patches, with branch coverage leading to least overfitting, the overfitting rate was still significant. We also compared automatically generated tests with manual, developer-written ones and found that while manual tests had lower coverage, the GI runs with manual tests led to less overfitting than in the case of automatically generated tests. Finally, we did not observe enough statistically significant correlations between the coverage metrics and overfitting ratios of patches, i.e., the coverage of test suites cannot be used as a linear predictor for the level of overfitting of the generated patches.

Code Coverage for Suite Evaluation by Developers (original) (raw)

Related papers