Design and development of frameworks for CPU verification efficiency improvement (original) (raw)
Related papers
Annual Computer Security Applications Conference
Among various fuzzing approaches, coverage-guided grey-box fuzzing is perhaps the most prominent, due to its ease of use and effectiveness. Using this approach, the selection of inputs focuses on maximizing program coverage, e.g., in terms of the different branches that have been traversed. In this work, we begin with the observation that selecting any input that explores a new path, and giving equal weight to all paths, can lead to severe inefficiencies. For instance, although seemingly "new" crashes involving previously unexplored paths may be discovered, these often have the same root cause and actually correspond to the same bug. To address these inefficiencies, we introduce a framework that incorporates a tighter feedback loop to guide the fuzzing process in exploring truly diverse code paths. Our framework employs (i) a vulnerability-aware selection of coverage metrics for enhancing the effectiveness of code exploration, (ii) crash deduplication information for early feedback, and (iii) a configurable input culling strategy that interleaves multiple strategies to achieve comprehensiveness. A novel aspect of our work is the use of hardware performance counters to derive coverage metrics. We present an approach for assessing and selecting the hardware events that can be used as a meaningful coverage metric for a target program. The results of our empirical evaluation using real-world programs demonstrate the effectiveness of our approach: in some cases, we explore fewer than 50% of the paths compared to a base fuzzer (AFL, MOpt, and Fairfuzz), yet on average, we improve new bug discovery by 31%, and find the same bugs (as the base) 3.3 times faster. Moreover, although we specifically chose applications that have been subject to recent fuzzing campaigns, we still discovered 9 new vulnerabilities. CCS CONCEPTS • Security and privacy → Vulnerability management. * The research was conducted while the author was a postdoc at UNC Chapel Hill.
Supporting Software Inspection with Static Profiling
Static software checking tools are useful as an additional automated software inspection step that can easily be integrated in the development cycle and assist in creating secure, reliable and high quality code. However, an often quoted disadvantage of these tools is that they generate an inordinate number of warnings, including many false positives due to the use of approximate analysis techniques. This information overload effectively limits their usefulness.
Automatic Microprocessor Performance Bug Detection
2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2021
Processor design validation and debug is a difficult and complex task, which consumes the lion's share of the design process. Design bugs that affect processor performance rather than its functionality are especially difficult to catch, particularly in new microarchitectures. This is because, unlike functional bugs, the correct processor performance of new microarchitectures on complex, long-running benchmarks is typically not deterministically known. Thus, when performance benchmarking new microarchitectures, performance teams may assume that the design is correct when the performance of the new microarchitecture exceeds that of the previous generation, despite significant performance regressions existing in the design. In this work we present a two-stage, machine learning-based methodology that is able to detect the existence of performance bugs in microprocessors. Our results show that our best technique detects 91.5% of microprocessor core performance bugs whose average IPC impact across the studied applications is greater than 1% versus a bug-free design with zero false positives. When evaluated on memory system bugs, our technique achieves 100% detection with zero false positives. Moreover, the detection is automatic, requiring very little performance engineer time.
Striking a new balance between program instrumentation and debugging time
Proceedings of the sixth conference on Computer systems - EuroSys '11, 2011
Although they are helpful in many cases, state-of-the-art bug reporting systems may impose excessive overhead on users, leak private information, or provide little help to the developer in locating the problem. In this paper, we explore a new approach to bug reporting that uses partial logging of branches to record the path leading to a bug. We use static and dynamic analysis (both in isolation and in tandem) to identify the branches that need to be logged. When a bug is encountered, the system uses symbolic execution along the partial branch trace to reproduce the problem and find a set of inputs that activate the bug. The partial branch log drastically reduces the number of paths that would otherwise need to be explored by the symbolic execution engine. We study the tradeoff between instrumentation overhead and debugging time using an open-source Web server, the diff utility, and four coreutils programs. Our results show that the instrumentation method that combines static and dynamic analysis strikes the best compromise, as it limits both the overhead of branch logging and the bug reproduction time. We conclude that our techniques represent an important step in improving bug reporting and making symbolic execution more practical for bug reproduction.
On Automatic Detection of Performance Bugs
2016 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), 2016
Software performance is a critical non-functional requirement, appearing in many fields such as mission critical applications, financial, and real time systems. In this work we focused on early detection of performance bugs; our software under study was a real time system used in the advertisement / marketing domain. Goal: Find a simple and easy to implement solution, predicting performance bugs. Method: We built several models using four machine learning methods, commonly used for defect prediction: C4.5 Decision Trees, Naïve Bayes, Bayesian Networks, and Logistic Regression. Results: Our empirical results show that a C4.5 model, using lines of code changed, file's age and size as explanatory variables, can be used to predict performance bugs (recall = 0.73, accuracy = 0.85, and precision = 0.96). We show that reducing the number of changes delivered on a commit, can decrease the chance of performance bug injection. Conclusions: We believe that our approach can help practitioners to eliminate performance bugs early in the development cycle. Our results are also of interest to theoreticians, establishing a link between functional bugs and (non-functional) performance bugs, and explicitly showing that attributes used for prediction of functional bugs can be used for prediction of performance bugs.
A unified framework for declarative debugging and testing
Inf. Softw. Technol., 2021
Debugging is the most challenging and time consuming task in software development. However, it is not properly integrated in the software development cycle, because the result of so much effort is not available in further iterations of the cycle, and the debugging process itself does not benefit from the outcome of other phases such as testing. Objective: We propose to integrate debugging and testing within a single unified framework where each phase generates useful information for the other and the outcomes of each phase are reused. Method: We consider a declarative debugging setting that employs tests to automatically entail the validity of some subcomputations, thus decreasing the time and effort needed to find a bug. Additionally, the debugger stores as new tests the information collected from the user during the debugging phase. This information becomes part of the program test suite, and can be used in future debugging sessions, and also as regression tests. Results: We define a general framework where declarative debugging establishes a bidirectional collaboration with testing. The new setting preserves the properties of the underlying declarative debugging framework (weak completeness and soundness) while generating test cases that can be used later in other debugging sessions or even in other cycles of the software development. The proposed framework is general enough to be instantiated to very different programming languages: Erlang (functional), Java (imperative, object-oriented), and SQL (data query); and the experimental results obtained for Erlang programs validate the effectiveness of the framework. Conclusion: We propose a general unified framework for debugging and testing that simplifies each phase and maximizes the reusability of the outcomes in the different phases of the software development cycle, therefore reducing the overall effort.
Statistical Debugging: A Hypothesis Testing-Based Approach
IEEE Transactions on Software Engineering, 2006
Manual debugging is tedious, as well as costly. The high cost has motivated the development of fault localization techniques, which help developers search for fault locations. In this paper, we propose a new statistical method, called SOBER, which automatically localizes software faults without any prior knowledge of the program semantics. Unlike existing statistical approaches that select predicates correlated with program failures, SOBER models the predicate evaluation in both correct and incorrect executions and regards a predicate as fault-relevant if its evaluation pattern in incorrect executions significantly diverges from that in correct ones. Featuring a rationale similar to that of hypothesis testing, SOBER quantifies the fault relevance of each predicate in a principled way. We systematically evaluate SOBER under the same setting as previous studies. The result clearly demonstrates the effectiveness: SOBER could help developers locate 68 out of the 130 faults in the Siemens suite by examining no more than 10 percent of the code, whereas the Cause Transition approach proposed by Holger et al. [6] and the statistical approach by Liblit et al. [12] locate 34 and 52 faults, respectively. Moreover, the effectiveness of SOBER is also evaluated in an "imperfect world," where the test suite is either inadequate or only partially labeled. The experiments indicate that SOBER could achieve competitive quality under these harsh circumstances. Two case studies with grep 2.2 and bc 1.06 are reported, which shed light on the applicability of SOBER on reasonably large programs.
Configuring Test Generators using Bug Reports: A Case Study of GCC Compiler and Csmith
The 36th Annual ACM Symposium on Applied Computing, Software Verification and Testing Track (SAC-SVT'21), 2021
The correctness of compilers is instrumental in the safety and reliability of other software systems, as bugs in compilers can produce executables that do not reflect the intent of programmers. Such errors are difficult to identify and debug. Random test program generators are commonly used in testing compilers, and they have been effective in uncovering bugs. However, the problem of guiding these test generators to produce test programs that are more likely to find bugs remains challenging. In this paper, we use the code snippets in the bug reports to guide the test generation. The main idea of this work is to extract insights from the bug reports about the language features that are more prone to inadequate implementation and using the insights to guide the test generators. We use the GCC C compiler to evaluate the effectiveness of this approach. In particular, we first cluster the test programs in the GCC bugs reports based on their features. We then use the centroids of the clusters to compute configurations for Csmith, a popular test generator for C compilers. We evaluated this approach on eight versions of GCC and found that our approach provides higher coverage and triggers more miscompilation failures than the state-of-the-art test generation techniques for GCC.
Runtime Verification Driven Debugging of Replayed Errors
Proceedings of the PhD Workshop of ICTSS, 2013
Bugs in embedded software often depend on inputs of the target environment which are often difficult to generate for testing. Our approach traces external inputs to the software during operation until a failure occurs. The execution trace is replayed in the laboratory. During replay automated verification techniques are applied in order to check whether the software violates the requirements of the specification. A detected violation can give the developer an indication pointing to the cause of the failure in the source code. The developer does not have to step with the debugger over the whole replayed execution in order to find the reason for the failure. Thus, the analysis process is accelerated.