Lionel Briand | Université du Luxembourg (original) (raw)

Papers by Lionel Briand

Research paper thumbnail of Empirical evaluations on the cost-effectiveness of state-based testing: An industrial case study

Information and Software Technology, 2014

Context: Test models describe the expected behavior of the software under test and provide the ba... more Context: Test models describe the expected behavior of the software under test and provide the basis for test case and oracle generation. When test models are expressed as UML state machines, this is typically referred to as state-based testing (SBT). Despite the importance of being systematic while testing, all testing activities are limited by resource constraints. Thus, reducing the cost of testing while ensuring sufficient fault detection is a common goal in software development. No rigorous industrial case studies of SBT have yet been published. Objective: In this paper, we evaluate the cost-effectiveness of SBT on actual control software by studying the combined influence of four testing aspects: coverage criterion, test oracle, test model and unspecified behavior (sneak paths). Method: An industrial case study was used to investigate the cost-effectiveness of SBT. To enable the evaluation of SBT techniques, a model-based testing tool was configured and used to automatically generate test suites. The test suites were evaluated using 26 real faults collected in a field study. Results: Results show that the more detailed and rigorous the test model and oracle, the higher the faultdetection ability of SBT. A less precise oracle achieved 67% fault detection, but the overall cost reduction of 13% was not enough to make the loss an acceptable trade-off. Removing details from the test model significantly reduced the cost by 85%. Interestingly, only a 24-37% reduction in fault detection was observed. Testing for sneak paths killed the remaining eleven mutants that could not be killed by the conformance test strategies. Conclusions: Each of the studied testing aspects influences cost-effectiveness and must be carefully considered in context when selecting strategies. Regardless of these choices, sneak-path testing is a necessary step in SBT since sneak paths are common while also undetectable by conformance testing.

Research paper thumbnail of Test case selection for black-box regression testing of database applications

Information and Software Technology, 2013

ABSTRACT ContextThis paper presents an approach for selecting regression test cases in the contex... more ABSTRACT ContextThis paper presents an approach for selecting regression test cases in the context of large-scale database applications. We focus on a black-box (specification-based) approach, relying on classification tree models to model the input domain of the system under test (SUT), in order to obtain a more practical and scalable solution. We perform an experiment in an industrial setting where the SUT is a large database application in Norway’s tax department.Objective We investigate the use of similarity-based test case selection for supporting black box regression testing of database applications. We have developed a practical approach and tool (DART) for functional black-box regression testing of database applications. In order to make the regression test approach scalable for large database applications, we needed a test case selection strategy that reduces the test execution costs and analysis effort. We used classification tree models to partition the input domain of the SUT in order to then select test cases. Rather than selecting test cases at random from each partition, we incorporated a similarity-based test case selection, hypothesizing that it would yield a higher fault detection rate.Method An experiment was conducted to determine which similarity-based selection algorithm was the most suitable in selecting test cases in large regression test suites, and whether similarity-based selection was a worthwhile and practical alternative to simpler solutions.ResultsThe results show that combining similarity measurement with partition-based test case selection, by using similarity-based test case selection within each partition, can provide improved fault detection rates over simpler solutions when specific conditions are met regarding the partitions.Conclusions Under the conditions present in the experiment the improvements were marginal. However, a detailed analysis concludes that the similarity-based selection strategy should be applied when a large number of test cases are contained in each partition and there is significant variability within partitions. If these conditions are not present, incorporating similarity measures is not worthwhile, since the gain is negligible over a random selection within each partition.

Research paper thumbnail of An assessment and comparison of common software cost estimation modeling techniques

This paper investigates two essential data-driven, software cost modeling: questions related to (... more This paper investigates two essential data-driven, software cost modeling: questions related to (1) What modeling _ _ techniques are likely to yield more accurate results when using typical software development cost data? and (2) What are the benefits and drawbacks of using organizationspecific data as compared to multi-organization databases? The former question is important in guiding software cost analysts in their choice of the right type of modeling technique, if at all possible. In order to address this issue, we assess and compare a selection of common cost modeling techniques fulfilling a number of important criteria using a large multi-organizational database in the business application domain. Namely, these are: ordinary least squares regression, stepwise ANOVA, CART, and analogy. The latter question is important in order to assess the feasibility of using multi-organization cost databases to build cost models and the benefits gained from local, company-specific data collection and modeling. As a large subset of the data in the multi-company database came from one organization, we were able to investigate this issue by comparing organization-specific models with models based on multi-organization data. Results show that the performances of the modeling techniques considered were not significantly different, with the exception of the analogy-based models which appear to be less accurate. Surprisingly, when using standard cost factors (e.g., COCOMO-like factors, Function Points), organization specific models did not yield better results than generic, multi-organization models.

Research paper thumbnail of A Comprehensive Evaluation of Capture-Recapture Models for Estimating Software Defect Content

IEEE Transactions on Software Engineering, 2000

An important requirement to control the inspection of software artifacts is to be able to decide,... more An important requirement to control the inspection of software artifacts is to be able to decide, based on more objective information, whether the inspection can stop or whether it should continue to achieve a suitable level of artifact quality. A prediction of the number of remaining defects in an inspected artifact can be used for decision making. Several studies in software engineering have considered capture-recapture models, originally proposed by biologists to estimate animal populations, to make a prediction. However, few studies compare the actual number of remaining defects to the one predicted by a capture-recapture model on real software engineering artifacts. Thus, there is little work looking at the robustness of capture-recapture models under realistic software engineering conditions, where it is expected that some of their assumptions will be violated. Simulations have been performed but no definite conclusions can be drawn regarding the degree of accuracy of such models under realistic inspection conditions, and the factors affecting this accuracy. Furthermore, the existing studies focused on a subset of the existing capture-recapture models. Thus a more exhaustive comparison is still missing. In this study, we focus on traditional inspections and estimate, based on actual inspections' data, the degree of accuracy of relevant, state-of-the-art capture-recapture models, as they have been proposed in biology and for which statistical estimators exist. In order to assess their robustness, we look at the impact of the number of inspectors and the number of actual defects on the estimators' accuracy based on actual inspection data. Our results show that models are strongly affected by the number of inspectors and, therefore, one must consider this factor before using capture-recapture models. When the number of inspectors is too small, no model is sufficiently accurate and underestimation may be substantial. In addition, some models perform better than others in a large number of conditions and plausible reasons are discussed. Based on our analyses, we recommend using a model taking into account that defects have different probabilities of being detected and the corresponding Jackknife estimator. Furthermore, we attempt to calibrate the prediction models based on their relative error, as previously computed on other inspections. Although intuitive and straightforward, we identified theoretical limitations to this approach, which were then confirmed by the data.

Research paper thumbnail of A UML-Based Approach to System Testing

Software and System Modeling, 2002

System testing is concerned with testing an entire system based on its specifications. In the con... more System testing is concerned with testing an entire system based on its specifications. In the context of object-oriented, UML development, this means that system test requirements are derived from UML analysis artifacts such as use cases, their corresponding sequence and collaboration diagrams, class diagrams, and possibly Object Constraint Language (OCL) expressions across all these artifacts. Our goal here is to support the derivation of functional system test requirements, which will be transformed into test cases, test oracles, and test drivers once we have detailed design information. In this paper, we describe a methodology in a practical way and illustrate it with an example. In this context, we address testability and automation issues, as the ultimate goal is to fully support system testing activities with high-capability tools.

Research paper thumbnail of Impact Analysis and Change Management of UML Models

The use of Unified Modeling Language (UML) analysis/design models on large projects leads to a la... more The use of Unified Modeling Language (UML) analysis/design models on large projects leads to a large number of interdependent UML diagrams. As software systems evolve, those diagrams undergo changes to, for instance, correct errors or address changes in the requirements. Those changes can in turn lead to subsequent changes to other elements in the UML diagrams. Impact analysis is then defined as the process of identifying the potential consequences (side-effects) of a change, and estimating what needs to be modified to accomplish a change. In this article, we propose a UML model-based approach to impact analysis that can be applied before any implementation of the changes, thus allowing an early decision-making and change planning process. We first verify that the UML diagrams are consistent (consistency check). Then changes between two different versions of a UML model are identified according to a change taxonomy, and model elements that are directly or indirectly impacted by those changes (i.e., may undergo changes) are determined using formally defined impact analysis rules (written with Object Constraint Language). A measure of distance between a changed element and potentially impacted elements is also proposed to prioritize the results of impact analysis according to their likelihood of occurrence. We also present a prototype tool that provides automated support for our impact analysis strategy, that we then apply on a case study to validate both the implementation and methodology.

Research paper thumbnail of A UML-Based Approach to System Testing

System testing is concerned with testing an entire system based on its specifications. In the con... more System testing is concerned with testing an entire system based on its specifications. In the context of object-oriented, UML development, this means that system test requirements are derived from UML analysis artifacts such as use cases, their corresponding sequence and collaboration diagrams, class diagrams, and possibly the use of the Object Constraint Language across all these artifacts. Our goal is to support the derivation of test requirements, which will be transformed into test cases, test oracles, and test drivers once we have detailed design information. Another important issue we address is the one of testability. Testability requirements (or rules) need to be imposed on UML artifacts so as to be able to support system testing efficiently. Those testability requirements result from a trade-off between analysis and design overhead and improved testability. The potential for automation is also an overriding concern all across our work as the ultimate goal is to fully support testing activities with high-capability tools.

Research paper thumbnail of Is mutation an appropriate tool for testing experiments

The empirical assessment of test techniques plays an important role in software testing research.... more The empirical assessment of test techniques plays an important role in software testing research. One common practice is to instrument faults, either manually or by using mutation operators. The latter allows the systematic, repeatable seeding of large numbers of faults; however, we do not know whether empirical results obtained this way lead to valid, representative conclusions. This paper investigates this important question based on a number of programs with comprehensive pools of test cases and known faults. It is concluded that, based on the data available thus far, the use of mutation operators is yielding trustworthy results (generated mutants are similar to real faults). Mutants appear however to be different from hand-seeded faults that seem to be harder to detect than real faults.

Research paper thumbnail of Exploring the relationships between design measures and software quality in object-oriented systems

Journal of Systems and Software, 2000

The first goal of this paper is to empirically explore the relationships between existing object-... more The first goal of this paper is to empirically explore the relationships between existing object-oriented coupling, cohesion, and inheritance measures and the probability of fault detection in system classes during testing. In other words, we wish to better understand the relationship between existing design measurement in OO systems and the quality of the software developed. The second goal is to propose an investigation and analysis strategy to make these kind of studies more repeatable and comparable, a problem which is pervasive in the literature on quality measurement. Results show that many of the measures capture similar dimensions in the data set, thus reflecting the fact that many of them are based on similar principles and hypotheses. However, it is shown that by using a subset of measures, accurate models can be built to predict which classes contain most of the existing faults. When predicting fault-prone classes, the best model shows a percentage of correct classifications higher than 80% and finds more than 90% of faulty classes. Besides the size of classes, the frequency of method invocations and the depth of inheritance hierarchies seem to be the main driving factors of fault proneness.

Research paper thumbnail of How reuse influences productivity in object-oriented systems

Communications of The ACM, 1996

... Each team was asked to develop a management information system supporting therental/return pr... more ... Each team was asked to develop a management information system supporting therental/return process of a hypothetical video rental business and the maintenance of customer and video databases. Such an application domain ...

Research paper thumbnail of Using Coupling Measurement for Impact Analysis in Object-Oriented Systems

Many coupling measures have been proposed in the context of object-oriented (OO) systems. In addi... more Many coupling measures have been proposed in the context of object-oriented (OO) systems. In addition, several studies have highlighted the complexity of using dependency analysis in OO software to perform impact analysis. The question is then: can we use simple decision models based on coupling measurement to support impact analysis in OO systems? Such an approach has for main advantage its simplicity and complete automation. To investigate this question, we perform here a thorough analysis on a commercial C++ system where change data has been collected over several years. We identify the coupling dimensions that seem to be significantly related to ripple effects and use them to rank classes according to their probability of containing ripple effects. We then assess the expected effectiveness of such decision models.

Research paper thumbnail of Investigating quality factors in object-oriented designs: an industrial case study

This paper aims at empirically exploring the relationships between most of the existing design co... more This paper aims at empirically exploring the relationships between most of the existing design coupling, cohesion, and inheritance measures for object-oriented (OO) systems, and the fault-proneness of OO system classes. The underlying goal of this study is to better understand the relationship between existing design measurement in OO systems and the quality of the software developed. In addition, we aim at assessing whether such relationships, once modeled, can be used to effectively drive and focus inspections or testing.

Research paper thumbnail of Dynamic Coupling Measurement for Object-Oriented Software

IEEE Transactions on Software Engineering, 2004

The relationships between coupling and external quality factors of object-oriented software have ... more The relationships between coupling and external quality factors of object-oriented software have been studied extensively for the past few years. For example, several studies have identified clear empirical relationships between class-level coupling and class fault-proneness. A common way to define and measure coupling is through structural properties and static code analysis. However, because of polymorphism, dynamic binding, and the common presence of unused ("dead") code in commercial software, the resulting coupling measures are imprecise as they do not perfectly reflect the actual coupling taking place among classes at run-time. For example, when using static analysis to measure coupling, it is difficult and sometimes impossible to determine what actual methods can be invoked from a client class if those methods are overridden in the subclasses of the server classes. Coupling measurement has traditionally been performed using static code analysis, because most of the existing work was done on non-object oriented code and because dynamic code analysis is more expensive and complex to perform. For modern software systems, however, this focus on static analysis can be problematic, because although dynamic binding existed before the advent of object-orientation, its usage has increased significantly in the last decade. This paper describes how coupling can be defined and precisely measured based on dynamic analysis of systems. We refer to this type of coupling as dynamic coupling. An empirical evaluation of the proposed dynamic coupling measures is reported in which we study the

Research paper thumbnail of COBRA: a hybrid method for software cost estimation, benchmarking, and risk assessment

Current cost estimation techniques have a number of drawbacks. For example, developing algorithmi... more Current cost estimation techniques have a number of drawbacks. For example, developing algorithmic models requires extensive past project data. Also, off-the-shelf models have been found to be difficult to calibrate but inaccurate without calibration. Informal approaches based on experienced estimators depend on estimators' availability and are not easily repeatable, as well as not being much more accurate than algorithmic techniques. In this paper we present a method for cost estimation that combines aspects of algorithmic and experiential approaches (referred to as COBRA, COst estimation, Benchmarking, and Risk Assessment). We find through a case study that cost estimates using COBRA show an average ARE of 0.09, and show that the results are easily usable for benchmarking and risk assessment purposes.

Research paper thumbnail of A Validation of Object-Oriented Design Metrics as Quality Indicators

IEEE Transactions on Software Engineering, 1996

This paper presents the results of a study conducted at the University of Maryland in which we ex... more This paper presents the results of a study conducted at the University of Maryland in which we experimentally investigated the suite of Object-Oriented (OO) design metrics introduced by [Chidamber&Kemerer, 1994]. In order to do this, we assessed these metrics as predictors of fault-prone classes. This study is complementary to [Li&Henry, 1993] where the same suite of metrics had been used to assess frequencies of maintenance changes to classes. To perform our validation accurately, we collected data on the development of eight medium-sized information management systems based on identical requirements. All eight projects were developed using a sequential life cycle model, a well-known OO analysis/design method and the C++ programming language. Based on experimental results, the advantages and drawbacks of these OO metrics are discussed. Several of Chidamber&Kemerer's OO metrics appear to be useful to predict class fault-proneness during the early phases of the life-cycle. We also showed that they are, on our data set, better predictors than "traditional" code metrics, which can only be collected at a later phase of the software development processes.

Research paper thumbnail of Theoretical and Empirical Validation of Software Product Measures

... useful. A measure is useful if it is related to a measure of some external attribute5, eg,mai... more ... useful. A measure is useful if it is related to a measure of some external attribute5, eg,maintainability. Measures of internal product attributes in software engineering are artificial concepts and do not hold any meaning in themselves. For ...

Research paper thumbnail of On the application of measurement theory in software engineering

Empirical Software Engineering, 1996

Elements of measurement theory have recently been introduced into the software engineering discip... more Elements of measurement theory have recently been introduced into the software engineering discipline. It has been suggested that these elements should serve as the basis for developing, reasoning about, and applying measures. For example, it has been suggested that software complexity measures should be additive, that measures fall into a number of distinct types (i.e., levels of measurement: nominal, ordinal, interval, and ratio), that certain statistical techniques are not appropriate for certain types of measures (e.g., parametric statistics for less-than-interval measures), and that certain transformations are not permissible for certain types of measures (e.g., non-linear transformations for interval measures). In this paper we argue that, inspite of the importance of measurement theory, and in the context of software engineering, many of these prescriptions and proscriptions are either premature or, if strictly applied, would represent a substantial hindrance to the progress of empirical research in software engineering. This argument is based partially on studies that have been conducted by behavioral scientists and by statisticians over the last five decades. We also present a pragmatic approach to the application of measurement theory in software engineering. While following our approach may lead to violations of the strict prescriptions and proscriptions of measurement theory, we demonstrate that in practical terms these violations would have diminished consequences, especially when compared to the advantages afforded to the practicing researcher.

Research paper thumbnail of Defining and Validating Measures for Object-Based High-Level Design

IEEE Transactions on Software Engineering, 1999

The availability of significant measures in the early phases of the software development lifecycl... more The availability of significant measures in the early phases of the software development lifecycle allows for better management of the later phases, and more effective quality assessment when quality can be more easily affected by preventive or corrective actions. In this paper, we introduce and compare various high-level design measures for object-based software systems. The measures are derived based on an experimental goal, identifying fault-prone software parts, and several experimental hypotheses arising from the development of Ada systems for Flight Dynamics Software at the NASA Goddard Space Flight Center (NASA/GSFC). Specifically, we define a set of measures for cohesion and coupling, and theoretically analyze them by checking their compliance with a previously published set of mathematical properties that we deem important. We then investigate their relationship to fault-proneness on three large scale projects, to provide empirical support for their practical significance and usefulness.

Research paper thumbnail of Developing Interpretable Models with Optimized Set Reduction for Identifying High-Risk Software Components

IEEE Transactions on Software Engineering, 1993

... originate during the software-development process In Section 11, we present an evolved versio... more ... originate during the software-development process In Section 11, we present an evolved version of the OSR algorithm (an earlier version of the OSR approach was applied to project cost estimation and published in [12]), which is intended to make OSR models more accurate ...

Research paper thumbnail of Characterizing and assessing a large-scale software maintenance organization

One important component of a software process is the organizational context in which the process ... more One important component of a software process is the organizational context in which the process is enacted. This component is often missing or incomplete in current process modeling approaches. One technique for modeling this perspective is the Actor-Dependency (AD) Model. This paper reports on a case study which used this approach to analyze and assess a large software maintenance organization. Our goal was to identify the approach's strengths and weaknesses while providing practical recommendations for improvement and research directions. The AD model was found to be very useful in capturing the important properties of the organizational context of the maintenance process, and aided in the understanding of the flaws found in this process. However, a number of opportunities for extending and improving the AD model were identified. Among others, there is a need to incorporate quantitative information to complement the qualitative model.

Research paper thumbnail of Empirical evaluations on the cost-effectiveness of state-based testing: An industrial case study

Information and Software Technology, 2014

Context: Test models describe the expected behavior of the software under test and provide the ba... more Context: Test models describe the expected behavior of the software under test and provide the basis for test case and oracle generation. When test models are expressed as UML state machines, this is typically referred to as state-based testing (SBT). Despite the importance of being systematic while testing, all testing activities are limited by resource constraints. Thus, reducing the cost of testing while ensuring sufficient fault detection is a common goal in software development. No rigorous industrial case studies of SBT have yet been published. Objective: In this paper, we evaluate the cost-effectiveness of SBT on actual control software by studying the combined influence of four testing aspects: coverage criterion, test oracle, test model and unspecified behavior (sneak paths). Method: An industrial case study was used to investigate the cost-effectiveness of SBT. To enable the evaluation of SBT techniques, a model-based testing tool was configured and used to automatically generate test suites. The test suites were evaluated using 26 real faults collected in a field study. Results: Results show that the more detailed and rigorous the test model and oracle, the higher the faultdetection ability of SBT. A less precise oracle achieved 67% fault detection, but the overall cost reduction of 13% was not enough to make the loss an acceptable trade-off. Removing details from the test model significantly reduced the cost by 85%. Interestingly, only a 24-37% reduction in fault detection was observed. Testing for sneak paths killed the remaining eleven mutants that could not be killed by the conformance test strategies. Conclusions: Each of the studied testing aspects influences cost-effectiveness and must be carefully considered in context when selecting strategies. Regardless of these choices, sneak-path testing is a necessary step in SBT since sneak paths are common while also undetectable by conformance testing.

Research paper thumbnail of Test case selection for black-box regression testing of database applications

Information and Software Technology, 2013

ABSTRACT ContextThis paper presents an approach for selecting regression test cases in the contex... more ABSTRACT ContextThis paper presents an approach for selecting regression test cases in the context of large-scale database applications. We focus on a black-box (specification-based) approach, relying on classification tree models to model the input domain of the system under test (SUT), in order to obtain a more practical and scalable solution. We perform an experiment in an industrial setting where the SUT is a large database application in Norway’s tax department.Objective We investigate the use of similarity-based test case selection for supporting black box regression testing of database applications. We have developed a practical approach and tool (DART) for functional black-box regression testing of database applications. In order to make the regression test approach scalable for large database applications, we needed a test case selection strategy that reduces the test execution costs and analysis effort. We used classification tree models to partition the input domain of the SUT in order to then select test cases. Rather than selecting test cases at random from each partition, we incorporated a similarity-based test case selection, hypothesizing that it would yield a higher fault detection rate.Method An experiment was conducted to determine which similarity-based selection algorithm was the most suitable in selecting test cases in large regression test suites, and whether similarity-based selection was a worthwhile and practical alternative to simpler solutions.ResultsThe results show that combining similarity measurement with partition-based test case selection, by using similarity-based test case selection within each partition, can provide improved fault detection rates over simpler solutions when specific conditions are met regarding the partitions.Conclusions Under the conditions present in the experiment the improvements were marginal. However, a detailed analysis concludes that the similarity-based selection strategy should be applied when a large number of test cases are contained in each partition and there is significant variability within partitions. If these conditions are not present, incorporating similarity measures is not worthwhile, since the gain is negligible over a random selection within each partition.

Research paper thumbnail of An assessment and comparison of common software cost estimation modeling techniques

This paper investigates two essential data-driven, software cost modeling: questions related to (... more This paper investigates two essential data-driven, software cost modeling: questions related to (1) What modeling _ _ techniques are likely to yield more accurate results when using typical software development cost data? and (2) What are the benefits and drawbacks of using organizationspecific data as compared to multi-organization databases? The former question is important in guiding software cost analysts in their choice of the right type of modeling technique, if at all possible. In order to address this issue, we assess and compare a selection of common cost modeling techniques fulfilling a number of important criteria using a large multi-organizational database in the business application domain. Namely, these are: ordinary least squares regression, stepwise ANOVA, CART, and analogy. The latter question is important in order to assess the feasibility of using multi-organization cost databases to build cost models and the benefits gained from local, company-specific data collection and modeling. As a large subset of the data in the multi-company database came from one organization, we were able to investigate this issue by comparing organization-specific models with models based on multi-organization data. Results show that the performances of the modeling techniques considered were not significantly different, with the exception of the analogy-based models which appear to be less accurate. Surprisingly, when using standard cost factors (e.g., COCOMO-like factors, Function Points), organization specific models did not yield better results than generic, multi-organization models.

Research paper thumbnail of A Comprehensive Evaluation of Capture-Recapture Models for Estimating Software Defect Content

IEEE Transactions on Software Engineering, 2000

An important requirement to control the inspection of software artifacts is to be able to decide,... more An important requirement to control the inspection of software artifacts is to be able to decide, based on more objective information, whether the inspection can stop or whether it should continue to achieve a suitable level of artifact quality. A prediction of the number of remaining defects in an inspected artifact can be used for decision making. Several studies in software engineering have considered capture-recapture models, originally proposed by biologists to estimate animal populations, to make a prediction. However, few studies compare the actual number of remaining defects to the one predicted by a capture-recapture model on real software engineering artifacts. Thus, there is little work looking at the robustness of capture-recapture models under realistic software engineering conditions, where it is expected that some of their assumptions will be violated. Simulations have been performed but no definite conclusions can be drawn regarding the degree of accuracy of such models under realistic inspection conditions, and the factors affecting this accuracy. Furthermore, the existing studies focused on a subset of the existing capture-recapture models. Thus a more exhaustive comparison is still missing. In this study, we focus on traditional inspections and estimate, based on actual inspections' data, the degree of accuracy of relevant, state-of-the-art capture-recapture models, as they have been proposed in biology and for which statistical estimators exist. In order to assess their robustness, we look at the impact of the number of inspectors and the number of actual defects on the estimators' accuracy based on actual inspection data. Our results show that models are strongly affected by the number of inspectors and, therefore, one must consider this factor before using capture-recapture models. When the number of inspectors is too small, no model is sufficiently accurate and underestimation may be substantial. In addition, some models perform better than others in a large number of conditions and plausible reasons are discussed. Based on our analyses, we recommend using a model taking into account that defects have different probabilities of being detected and the corresponding Jackknife estimator. Furthermore, we attempt to calibrate the prediction models based on their relative error, as previously computed on other inspections. Although intuitive and straightforward, we identified theoretical limitations to this approach, which were then confirmed by the data.

Research paper thumbnail of A UML-Based Approach to System Testing

Software and System Modeling, 2002

System testing is concerned with testing an entire system based on its specifications. In the con... more System testing is concerned with testing an entire system based on its specifications. In the context of object-oriented, UML development, this means that system test requirements are derived from UML analysis artifacts such as use cases, their corresponding sequence and collaboration diagrams, class diagrams, and possibly Object Constraint Language (OCL) expressions across all these artifacts. Our goal here is to support the derivation of functional system test requirements, which will be transformed into test cases, test oracles, and test drivers once we have detailed design information. In this paper, we describe a methodology in a practical way and illustrate it with an example. In this context, we address testability and automation issues, as the ultimate goal is to fully support system testing activities with high-capability tools.

Research paper thumbnail of Impact Analysis and Change Management of UML Models

The use of Unified Modeling Language (UML) analysis/design models on large projects leads to a la... more The use of Unified Modeling Language (UML) analysis/design models on large projects leads to a large number of interdependent UML diagrams. As software systems evolve, those diagrams undergo changes to, for instance, correct errors or address changes in the requirements. Those changes can in turn lead to subsequent changes to other elements in the UML diagrams. Impact analysis is then defined as the process of identifying the potential consequences (side-effects) of a change, and estimating what needs to be modified to accomplish a change. In this article, we propose a UML model-based approach to impact analysis that can be applied before any implementation of the changes, thus allowing an early decision-making and change planning process. We first verify that the UML diagrams are consistent (consistency check). Then changes between two different versions of a UML model are identified according to a change taxonomy, and model elements that are directly or indirectly impacted by those changes (i.e., may undergo changes) are determined using formally defined impact analysis rules (written with Object Constraint Language). A measure of distance between a changed element and potentially impacted elements is also proposed to prioritize the results of impact analysis according to their likelihood of occurrence. We also present a prototype tool that provides automated support for our impact analysis strategy, that we then apply on a case study to validate both the implementation and methodology.

Research paper thumbnail of A UML-Based Approach to System Testing

System testing is concerned with testing an entire system based on its specifications. In the con... more System testing is concerned with testing an entire system based on its specifications. In the context of object-oriented, UML development, this means that system test requirements are derived from UML analysis artifacts such as use cases, their corresponding sequence and collaboration diagrams, class diagrams, and possibly the use of the Object Constraint Language across all these artifacts. Our goal is to support the derivation of test requirements, which will be transformed into test cases, test oracles, and test drivers once we have detailed design information. Another important issue we address is the one of testability. Testability requirements (or rules) need to be imposed on UML artifacts so as to be able to support system testing efficiently. Those testability requirements result from a trade-off between analysis and design overhead and improved testability. The potential for automation is also an overriding concern all across our work as the ultimate goal is to fully support testing activities with high-capability tools.

Research paper thumbnail of Is mutation an appropriate tool for testing experiments

The empirical assessment of test techniques plays an important role in software testing research.... more The empirical assessment of test techniques plays an important role in software testing research. One common practice is to instrument faults, either manually or by using mutation operators. The latter allows the systematic, repeatable seeding of large numbers of faults; however, we do not know whether empirical results obtained this way lead to valid, representative conclusions. This paper investigates this important question based on a number of programs with comprehensive pools of test cases and known faults. It is concluded that, based on the data available thus far, the use of mutation operators is yielding trustworthy results (generated mutants are similar to real faults). Mutants appear however to be different from hand-seeded faults that seem to be harder to detect than real faults.

Research paper thumbnail of Exploring the relationships between design measures and software quality in object-oriented systems

Journal of Systems and Software, 2000

The first goal of this paper is to empirically explore the relationships between existing object-... more The first goal of this paper is to empirically explore the relationships between existing object-oriented coupling, cohesion, and inheritance measures and the probability of fault detection in system classes during testing. In other words, we wish to better understand the relationship between existing design measurement in OO systems and the quality of the software developed. The second goal is to propose an investigation and analysis strategy to make these kind of studies more repeatable and comparable, a problem which is pervasive in the literature on quality measurement. Results show that many of the measures capture similar dimensions in the data set, thus reflecting the fact that many of them are based on similar principles and hypotheses. However, it is shown that by using a subset of measures, accurate models can be built to predict which classes contain most of the existing faults. When predicting fault-prone classes, the best model shows a percentage of correct classifications higher than 80% and finds more than 90% of faulty classes. Besides the size of classes, the frequency of method invocations and the depth of inheritance hierarchies seem to be the main driving factors of fault proneness.

Research paper thumbnail of How reuse influences productivity in object-oriented systems

Communications of The ACM, 1996

... Each team was asked to develop a management information system supporting therental/return pr... more ... Each team was asked to develop a management information system supporting therental/return process of a hypothetical video rental business and the maintenance of customer and video databases. Such an application domain ...

Research paper thumbnail of Using Coupling Measurement for Impact Analysis in Object-Oriented Systems

Many coupling measures have been proposed in the context of object-oriented (OO) systems. In addi... more Many coupling measures have been proposed in the context of object-oriented (OO) systems. In addition, several studies have highlighted the complexity of using dependency analysis in OO software to perform impact analysis. The question is then: can we use simple decision models based on coupling measurement to support impact analysis in OO systems? Such an approach has for main advantage its simplicity and complete automation. To investigate this question, we perform here a thorough analysis on a commercial C++ system where change data has been collected over several years. We identify the coupling dimensions that seem to be significantly related to ripple effects and use them to rank classes according to their probability of containing ripple effects. We then assess the expected effectiveness of such decision models.

Research paper thumbnail of Investigating quality factors in object-oriented designs: an industrial case study

This paper aims at empirically exploring the relationships between most of the existing design co... more This paper aims at empirically exploring the relationships between most of the existing design coupling, cohesion, and inheritance measures for object-oriented (OO) systems, and the fault-proneness of OO system classes. The underlying goal of this study is to better understand the relationship between existing design measurement in OO systems and the quality of the software developed. In addition, we aim at assessing whether such relationships, once modeled, can be used to effectively drive and focus inspections or testing.

Research paper thumbnail of Dynamic Coupling Measurement for Object-Oriented Software

IEEE Transactions on Software Engineering, 2004

The relationships between coupling and external quality factors of object-oriented software have ... more The relationships between coupling and external quality factors of object-oriented software have been studied extensively for the past few years. For example, several studies have identified clear empirical relationships between class-level coupling and class fault-proneness. A common way to define and measure coupling is through structural properties and static code analysis. However, because of polymorphism, dynamic binding, and the common presence of unused ("dead") code in commercial software, the resulting coupling measures are imprecise as they do not perfectly reflect the actual coupling taking place among classes at run-time. For example, when using static analysis to measure coupling, it is difficult and sometimes impossible to determine what actual methods can be invoked from a client class if those methods are overridden in the subclasses of the server classes. Coupling measurement has traditionally been performed using static code analysis, because most of the existing work was done on non-object oriented code and because dynamic code analysis is more expensive and complex to perform. For modern software systems, however, this focus on static analysis can be problematic, because although dynamic binding existed before the advent of object-orientation, its usage has increased significantly in the last decade. This paper describes how coupling can be defined and precisely measured based on dynamic analysis of systems. We refer to this type of coupling as dynamic coupling. An empirical evaluation of the proposed dynamic coupling measures is reported in which we study the

Research paper thumbnail of COBRA: a hybrid method for software cost estimation, benchmarking, and risk assessment

Current cost estimation techniques have a number of drawbacks. For example, developing algorithmi... more Current cost estimation techniques have a number of drawbacks. For example, developing algorithmic models requires extensive past project data. Also, off-the-shelf models have been found to be difficult to calibrate but inaccurate without calibration. Informal approaches based on experienced estimators depend on estimators' availability and are not easily repeatable, as well as not being much more accurate than algorithmic techniques. In this paper we present a method for cost estimation that combines aspects of algorithmic and experiential approaches (referred to as COBRA, COst estimation, Benchmarking, and Risk Assessment). We find through a case study that cost estimates using COBRA show an average ARE of 0.09, and show that the results are easily usable for benchmarking and risk assessment purposes.

Research paper thumbnail of A Validation of Object-Oriented Design Metrics as Quality Indicators

IEEE Transactions on Software Engineering, 1996

This paper presents the results of a study conducted at the University of Maryland in which we ex... more This paper presents the results of a study conducted at the University of Maryland in which we experimentally investigated the suite of Object-Oriented (OO) design metrics introduced by [Chidamber&Kemerer, 1994]. In order to do this, we assessed these metrics as predictors of fault-prone classes. This study is complementary to [Li&Henry, 1993] where the same suite of metrics had been used to assess frequencies of maintenance changes to classes. To perform our validation accurately, we collected data on the development of eight medium-sized information management systems based on identical requirements. All eight projects were developed using a sequential life cycle model, a well-known OO analysis/design method and the C++ programming language. Based on experimental results, the advantages and drawbacks of these OO metrics are discussed. Several of Chidamber&Kemerer's OO metrics appear to be useful to predict class fault-proneness during the early phases of the life-cycle. We also showed that they are, on our data set, better predictors than "traditional" code metrics, which can only be collected at a later phase of the software development processes.

Research paper thumbnail of Theoretical and Empirical Validation of Software Product Measures

... useful. A measure is useful if it is related to a measure of some external attribute5, eg,mai... more ... useful. A measure is useful if it is related to a measure of some external attribute5, eg,maintainability. Measures of internal product attributes in software engineering are artificial concepts and do not hold any meaning in themselves. For ...

Research paper thumbnail of On the application of measurement theory in software engineering

Empirical Software Engineering, 1996

Elements of measurement theory have recently been introduced into the software engineering discip... more Elements of measurement theory have recently been introduced into the software engineering discipline. It has been suggested that these elements should serve as the basis for developing, reasoning about, and applying measures. For example, it has been suggested that software complexity measures should be additive, that measures fall into a number of distinct types (i.e., levels of measurement: nominal, ordinal, interval, and ratio), that certain statistical techniques are not appropriate for certain types of measures (e.g., parametric statistics for less-than-interval measures), and that certain transformations are not permissible for certain types of measures (e.g., non-linear transformations for interval measures). In this paper we argue that, inspite of the importance of measurement theory, and in the context of software engineering, many of these prescriptions and proscriptions are either premature or, if strictly applied, would represent a substantial hindrance to the progress of empirical research in software engineering. This argument is based partially on studies that have been conducted by behavioral scientists and by statisticians over the last five decades. We also present a pragmatic approach to the application of measurement theory in software engineering. While following our approach may lead to violations of the strict prescriptions and proscriptions of measurement theory, we demonstrate that in practical terms these violations would have diminished consequences, especially when compared to the advantages afforded to the practicing researcher.

Research paper thumbnail of Defining and Validating Measures for Object-Based High-Level Design

IEEE Transactions on Software Engineering, 1999

The availability of significant measures in the early phases of the software development lifecycl... more The availability of significant measures in the early phases of the software development lifecycle allows for better management of the later phases, and more effective quality assessment when quality can be more easily affected by preventive or corrective actions. In this paper, we introduce and compare various high-level design measures for object-based software systems. The measures are derived based on an experimental goal, identifying fault-prone software parts, and several experimental hypotheses arising from the development of Ada systems for Flight Dynamics Software at the NASA Goddard Space Flight Center (NASA/GSFC). Specifically, we define a set of measures for cohesion and coupling, and theoretically analyze them by checking their compliance with a previously published set of mathematical properties that we deem important. We then investigate their relationship to fault-proneness on three large scale projects, to provide empirical support for their practical significance and usefulness.

Research paper thumbnail of Developing Interpretable Models with Optimized Set Reduction for Identifying High-Risk Software Components

IEEE Transactions on Software Engineering, 1993

... originate during the software-development process In Section 11, we present an evolved versio... more ... originate during the software-development process In Section 11, we present an evolved version of the OSR algorithm (an earlier version of the OSR approach was applied to project cost estimation and published in [12]), which is intended to make OSR models more accurate ...

Research paper thumbnail of Characterizing and assessing a large-scale software maintenance organization

One important component of a software process is the organizational context in which the process ... more One important component of a software process is the organizational context in which the process is enacted. This component is often missing or incomplete in current process modeling approaches. One technique for modeling this perspective is the Actor-Dependency (AD) Model. This paper reports on a case study which used this approach to analyze and assess a large software maintenance organization. Our goal was to identify the approach's strengths and weaknesses while providing practical recommendations for improvement and research directions. The AD model was found to be very useful in capturing the important properties of the organizational context of the maintenance process, and aided in the understanding of the flaws found in this process. However, a number of opportunities for extending and improving the AD model were identified. Among others, there is a need to incorporate quantitative information to complement the qualitative model.