Bor-Yuh Evan Chang | University of Colorado, Boulder (original) (raw)
Papers by Bor-Yuh Evan Chang
Proceedings of the 2016 ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, 2016
Today's programmers face a false choice between creating software that is extensible and software... more Today's programmers face a false choice between creating software that is extensible and software that is correct. Specifically, dynamic languages permit software that is richly extensible (via dynamic code loading, dynamic object extension, and various forms of reflection), and today's programmers exploit this flexibility to "bring their own language features" to enrich extensible languages (e.g., by using common JavaScript libraries). Meanwhile, such librarybased language extensions generally lack enforcement of their abstractions, leading to programming errors that are complex to avoid and predict. To offer verification for this extensible world, we propose online verification-validation (OVV), which consists of language and VM design that enables a "phaseless" approach to program analysis, in contrast to the standard static-dynamic phase distinction. Phaseless analysis freely interposes abstract interpretation with concrete execution, allowing analyses to use dynamic (concrete) information to prove universal (abstract) properties about future execution. In this paper, we present a conceptual overview of OVV through a motivating example program that uses a hypothetical database library. We present a generic semantics for OVV, and an extension to this semantics that offers a simple gradual type system for the database library primitives. The result of instantiating this gradual type system in an OVV setting is a checker that can progressively type successive continuations of the program until a continuation is fully verified. To evaluate the proposed vision of OVV for this example, we implement the VM semantics (in Rust), and show that this design permits progressive typing in this manner.
Springer eBooks, 2002
We believe that fundamental to the establishment of a grid computing framework where all (not jus... more We believe that fundamental to the establishment of a grid computing framework where all (not just large organizations) are able to effectively tap into the resources available on the global network is the establishment of trust between grid application developers and resource donors. Resource donors must be able to trust that their security, safety, and privacy policies will be respected by programs that use their systems. In this paper, we present a novel solution based on the notion of certified code that upholds safety, security, and privacy policies by examining intrinsic properties of code. Certified code complements authentication and provides a foundation for a safe, secure, and efficient framework that executes native code. We describe the implementation of such a framework known as the ConCert software.
We present a novel approach to differential cost analysis that, given a program revision, attempt... more We present a novel approach to differential cost analysis that, given a program revision, attempts to statically bound the difference in resource usage, or cost, between the two program versions. Differential cost analysis is particularly interesting because of the many compelling applications for it, such as detecting resource-use regressions at code-review time or proving the absence of certain side-channel vulnerabilities. One prior approach to differential cost analysis is to apply relational reasoning that conceptually constructs a product program on which one can over-approximate the difference in costs between the two program versions. However, a significant challenge in any relational approach is effectively aligning the program versions to get precise results. In this paper, our key insight is that we can avoid the need for and the limitations of program alignment if, instead, we bound the difference of two cost-bound summaries rather than directly bounding the concrete cost difference. In particular, * This paper describes work performed in part while Ðorđe Žikelić was an Applied Scientist Intern at Amazon. 2 Bor-Yuh Evan Chang holds concurrent appointments at the University of Colorado Boulder and as an Amazon Scholar. This paper describes work performed at Amazon and is not associated with CU Boulder. 3 Franco Raimondi holds concurrent appointments at Middlesex University and as an Amazon Scholar. This paper describes work performed at Amazon and is not associated with Middlesex.
arXiv (Cornell University), Nov 16, 2015
This paper presents incremental verification-validation, a novel approach for checking rich data ... more This paper presents incremental verification-validation, a novel approach for checking rich data structure invariants expressed as separation logic assertions. Incremental verification-validation combines static verification of separation properties with efficient, shortcircuiting dynamic validation of arbitrarily rich data constraints. A data structure invariant checker is an inductive predicate in separation logic with an executable interpretation; a short-circuiting checker is an invariant checker that stops checking whenever it detects at run time that an assertion for some sub-structure has been fully proven statically. At a high level, our approach does two things: it statically proves the separation properties of data structure invariants using a static shape analysis in a standard way but then leverages this proof in a novel manner to synthesize short-circuiting dynamic validation of the data properties. As a consequence, we enable dynamic validation to make up for imprecision in sound static analysis while simultaneously leveraging the static verification to make the remaining dynamic validation efficient. We show empirically that short-circuiting can yield asymptotic improvements in dynamic validation, with low overhead over no validation, even in cases where static verification is incomplete.
Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation
We present a novel approach to differential cost analysis that, given a program revision, attempt... more We present a novel approach to differential cost analysis that, given a program revision, attempts to statically bound the difference in resource usage, or cost, between the two program versions. Differential cost analysis is particularly interesting because of the many compelling applications for it, such as detecting resource-use regressions at code-review time or proving the absence of certain side-channel vulnerabilities. One prior approach to differential cost analysis is to apply relational reasoning that conceptually constructs a product program on which one can over-approximate the difference in costs between the two program versions. However, a significant challenge in any relational approach is effectively aligning the program versions to get precise results. In this paper, our key insight is that we can avoid the need for and the limitations of program alignment if, instead, we bound the difference of two cost-bound summaries rather than directly bounding the concrete cost difference. In particular, * This paper describes work performed in part while Ðorđe Žikelić was an Applied Scientist Intern at Amazon. 2 Bor-Yuh Evan Chang holds concurrent appointments at the University of Colorado Boulder and as an Amazon Scholar. This paper describes work performed at Amazon and is not associated with CU Boulder. 3 Franco Raimondi holds concurrent appointments at Middlesex University and as an Amazon Scholar. This paper describes work performed at Amazon and is not associated with Middlesex.
A linear syntax for natural deduction proofs in first-order intuitionistic logic is presented, wh... more A linear syntax for natural deduction proofs in first-order intuitionistic logic is presented, which has been an effective tool for teaching logic. The proof checking algorithm is also given, which is the core of the tutorial proof checker Tutch. This syntax is then extended to proofs on the assertion level which resemble single inferences one would make in a rigorous proof. The resulting language has only four constructs. Checking of these proofs is decidable, and an efficient algorithm is given.
To make development of grid applications less arduous, a natural, powerful, and convenient progra... more To make development of grid applications less arduous, a natural, powerful, and convenient programming interface is required. First, we propose an expressive grid programming language which we hope will provide such an interface. Then we show how to map programs in this language onto a low-level, more compact architecture that can more easily provide the fault tolerance and inexpensive scheduling suit- able for grid computing. Finally, we discuss programming techniques for taking advantage of the underlying architecture, as well as issues to be resolved in future work.
We reexamine the foundations of linear logic, developing a system of natural deduction following ... more We reexamine the foundations of linear logic, developing a system of natural deduction following Martin-Löf's separation of judgments from propositions. Our construction yields a clean and elegant formulation that accounts for a rich set of multiplicative, additive, and exponential connectives, extending dual intuitionistic linear logic but differing from both classical linear logic and Hyland and de Paiva's full intuitionistic linear logic. We also provide a corresponding sequent calculus that admits a simple proof of the admissibility of cut by a single structural induction. Finally, we show how to interpret classical linear logic (with or without the MIX rule) in our system, employing a form of double-negation translation.
The aim of static analysis is to infer invariants about programs that are precise enough to estab... more The aim of static analysis is to infer invariants about programs that are precise enough to establish semantic properties, such as the absence of run-time errors. Broadly speaking, there are two major branches of static analysis for imperative programs. Pointer and shape analyses focus on inferring properties of pointers, dynamically-allocated memory, and recursive data structures, while numeric analyses seek to derive invariants on numeric values. Although simultaneous inference of shape-numeric invariants is often needed, this case is especially challenging and is not particularly well explored. Notably, simultaneous shape-numeric inference raises complex issues in the design of the static analyzer itself. In this paper, we study the construction of such shape-numeric, static analyzers. We set up an abstract interpretation framework that allows us to reason about simultaneous shape-numeric proper-ties by combining shape and numeric abstractions into a modular, expressive abstract ...
Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2021
We consider the problem of making expressive static analyzers interactive. Formal static analysis... more We consider the problem of making expressive static analyzers interactive. Formal static analysis is seeing increasingly widespread adoption as a tool for verification and bugfinding, but even with powerful cloud infrastructure it can take minutes or hours to get batch analysis results after a code change. While existing techniques offer some demanddriven or incremental aspects for certain classes of analysis, the fundamental challenge we tackle is doing both for arbitrary abstract interpreters. Our technique, demanded abstract interpretation, lifts program syntax and analysis state to a dynamically evolving graph structure, in which program edits, client-issued queries, and evaluation of abstract semantics are all treated uniformly. The key difficulty addressed by our approach is the application of general incremental computation techniques to the complex, cyclic dependency structure induced by abstract interpretation of loops with widening operators. We prove that desirable abstract interpretation meta-properties, including soundness and termination, are preserved in our approach, and that demanded analysis results are equal to those computed by a batch abstract interpretation. Experimental results suggest promise for a prototype demanded abstract interpretation framework: by combining incremental and demand-driven techniques, our framework consistently delivers analysis results at interactive speeds, answering 95% of queries within 1.2 seconds. CCS Concepts: • Theory of computation → Program analysis; • Software and its engineering → Formal software verification. * Bor-Yuh Evan Chang holds concurrent appointments at the University of Colorado Boulder and as an Amazon Scholar. This paper describes work performed at CU Boulder and is not associated with Amazon.
Proceedings of the ACM on Programming Languages, 2019
Static analysis tools for JavaScript must strike a delicate balance, achieving the level of preci... more Static analysis tools for JavaScript must strike a delicate balance, achieving the level of precision required by the most complex features of target programs without incurring prohibitively high analysis time. For example, reasoning about dynamic property accesses sometimes requires precise relational information connecting the object, the dynamically-computed property name, and the property value. Even a minor precision loss at such critical program locations can result in a proliferation of spurious dataflow that renders the analysis results useless. We present a technique by which a conventional non-relational static dataflow analysis can be combined soundly with a value refinement mechanism to increase precision on demand at critical locations. Crucially, our technique is able to incorporate relational information from the value refinement mechanism into the non-relational domain of the dataflow analysis. We demonstrate the feasibility of this approach by extending an existing ...
ACM SIGPLAN Notices, 2017
To infer complex structural invariants, shape analyses rely on expressive families of logical pro... more To infer complex structural invariants, shape analyses rely on expressive families of logical properties. Many such analyses manipulate abstract memory states that consist of separating conjunctions of basic predicates describing atomic blocks or summaries. Moreover, they use finite disjunctions of abstract memory states in order to account for dissimilar shapes. Disjunctions should be kept small for the sake of scalability, though precision often requires to keep additional case splits. In this context, deciding when and how to merge case splits and to replace them with summaries is critical both for the precision and for the efficiency. Existing techniques use sets of syntactic rules, which are tedious to design and prone to failure. In this paper, we design a semantic criterion to clump abstract states based on their silhouette which applies not only to the conservative union of disjuncts, but also to the weakening of separating conjunction of memory predicates into inductive sum...
Lecture Notes in Computer Science, 2013
Real-world data structures are often enhanced with additional pointers capturing alternative path... more Real-world data structures are often enhanced with additional pointers capturing alternative paths through a basic inductive skeleton (e.g., back pointers, head pointers). From the static analysis point of view, we must obtain several interlocking shape invariants. At the same time, it is well understood in abstract interpretation design that supporting a separation of concerns is critically important to designing powerful static analyses. Such a separation of concerns is often obtained via a reduced product on a case-by-case basis. In this paper, we lift this idea to abstract domains for shape analyses, introducing a domain combination operator for memory abstractions. As an example, we present simultaneous separating shape graphs, a product construction that combines instances of separation logic-based shape domains. The key enabler for this construction is a static analysis on inductive data structure definitions to derive relations between the skeleton and the alternative paths. From the engineering standpoint, this construction allows each component to reason independently about different aspects of the data structure invariant and then separately exchange information via a reduction operator. From the usability standpoint, we enable describing a data structure invariant in terms of several inductive definitions that hold simultaneously.
Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2014
Objective-C 76 r2 annotations on system libraries 6 libraries and 3 applications 136 annotations ... more Objective-C 76 r2 annotations on system libraries 6 libraries and 3 applications 136 annotations on benchmark code Manual type annotations 1,000 to 176,000 lines of code Plugin for clang static analyzer in C++ Prototype analysis implementation 9 Objective-C benchmarks Case Study: Reflection in Objective-C 76 r2 annotations on system libraries 6 libraries and 3 applications 136 annotations on benchmark code Manual type annotations
Electronic Proceedings in Theoretical Computer Science, 2013
The aim of static analysis is to infer invariants about programs that are precise enough to estab... more The aim of static analysis is to infer invariants about programs that are precise enough to establish semantic properties, such as the absence of run-time errors. Broadly speaking, there are two major branches of static analysis for imperative programs. Pointer and shape analyses focus on inferring properties of pointers, dynamically-allocated memory, and recursive data structures, while numeric analyses seek to derive invariants on numeric values. Although simultaneous inference of shapenumeric invariants is often needed, this case is especially challenging and is not particularly well explored. Notably, simultaneous shape-numeric inference raises complex issues in the design of the static analyzer itself. In this paper, we study the construction of such shape-numeric, static analyzers. We set up an abstract interpretation framework that allows us to reason about simultaneous shape-numeric properties by combining shape and numeric abstractions into a modular, expressive abstract domain. Such a modular structure is highly desirable to make its formalization and implementation easier to do and get correct. To achieve this, we choose a concrete semantics that can be abstracted step-by-step, while preserving a high level of expressiveness. The structure of abstract operations (i.e., transfer, join, and comparison) follows the structure of this semantics. The advantage of this construction is to divide the analyzer in modules and functors that implement abstractions of distinct features.
Lecture Notes in Computer Science, 2014
The breadth and depth of heap properties that can be inferred by the union of today's shape analy... more The breadth and depth of heap properties that can be inferred by the union of today's shape analyses is quite astounding. Yet, achieving scalability while supporting a wide range of complex data structures in a generic way remains a long-standing challenge. In this paper, we propose a way to side-step this issue by defining a generic abstract domain combinator for combining memory abstractions on disjoint regions. In essence, our abstract domain construction is to the separating conjunction in separation logic as the reduced product construction is to classical, non-separating conjunction. This approach eases the design of the analysis as memory abstract domains can be re-used by applying our separating conjunction domain combinator. And more importantly, this combinator enables an analysis designer to easily create a combined domain that applies computationally-expensive abstract domains only where it is required.
ACM SIGPLAN Notices, 2008
Shape analyses are concerned with precise abstractions of the heap to capture detailed structural... more Shape analyses are concerned with precise abstractions of the heap to capture detailed structural properties. To do so, they need to build and decompose summaries of disjoint memory regions. Unfortunately, many data structure invariants require relations be tracked across disjoint regions, such as intricate numerical data invariants or structural invariants concerning back and cross pointers. In this paper, we identify issues inherent to analyzing relational structures and design an abstract domain that is parameterized both by an abstract domain for pure data properties and by user-supplied specifications of the data structure invariants to check. Particularly, it supports hybrid invariants about shape and data and features a generic mechanism for materializing summaries at the beginning, middle, or end of inductive structures. Around this domain, we build a shape analysis whose interesting components include a pre-analysis on the user-supplied specifications that guides the abstra...
Static Analysis
Developer-supplied data structure specifications are important to shape analyses, as they tell th... more Developer-supplied data structure specifications are important to shape analyses, as they tell the analysis what information should be tracked in order to obtain the desired shape invariants. We observe that data structure checking code (e.g., used in testing or dynamic analysis) provides shape information that can also be used in static analysis. In this paper, we propose a lightweight, automatic shape analysis based on these developer-supplied structural invariant checkers. In particular, we set up a parametric abstract domain, which is instantiated with such checker specifications to summarize memory regions using both notions of complete and partial checker evaluations. The analysis then automatically derives a strategy for canonicalizing or weakening shape invariants.
Proceedings of the 2012 International Symposium on Software Testing and Analysis, 2012
A static analysis design is sufficient if it can prove the property of interest with an acceptabl... more A static analysis design is sufficient if it can prove the property of interest with an acceptable number of false alarms. Ultimately, the only way to confirm that an analysis design is sufficient is to implement it and run it on real-world programs. If the evaluation shows that the design is insufficient, the designer must return to the drawing board and repeat the process-wasting expensive implementation effort over and over again. In this paper, we make the observation that there is a minimal range of code needed to prove a property of interest under an ideal static analysis; we call such a range of code a validation scope. Armed with this observation, we create a dynamic measurement framework that quantifies validation scopes and thus enables designers to rule out insufficient designs at lower cost. A novel attribute of our framework is the ability to model aspects of static reasoning using dynamic execution measurements. To evaluate the flexibility of our framework, we instantiate it on an example property-null dereference errors-and measure validation scopes on real-world programs. We use a broad range of metrics that capture the difficulty of analyzing programs along varying dimensions. We also examine how validation scopes evolve as developers fix null dereference errors and as code matures. We find that bug fixes shorten validation scopes, that longer validation scopes are more likely to be buggy, and that overall validation scopes are remarkably stable as programs evolve.
Lecture Notes in Computer Science, 2010
Detailed memory models that expose individual fields are necessary to precisely analyze code that... more Detailed memory models that expose individual fields are necessary to precisely analyze code that makes use of low-level aspects such as, pointers to fields and untagged unions. Yet, higher-level representations that collect fields into records are often used because they are typically more convenient and efficient in modeling the program heap. In this paper, we present a shape graph representation of memory that exposes individual fields while largely retaining the convenience of an object-level model. This representation has a close connection to particular kinds of formulas in separation logic. Then, with this representation, we show how to extend the Xisa shape analyzer for low-level aspects, including pointers to fields, C-style nested structures and unions, malloc and free, and array values, with minimal changes to the core algorithms (e.g., materialization and summarization).
Proceedings of the 2016 ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, 2016
Today's programmers face a false choice between creating software that is extensible and software... more Today's programmers face a false choice between creating software that is extensible and software that is correct. Specifically, dynamic languages permit software that is richly extensible (via dynamic code loading, dynamic object extension, and various forms of reflection), and today's programmers exploit this flexibility to "bring their own language features" to enrich extensible languages (e.g., by using common JavaScript libraries). Meanwhile, such librarybased language extensions generally lack enforcement of their abstractions, leading to programming errors that are complex to avoid and predict. To offer verification for this extensible world, we propose online verification-validation (OVV), which consists of language and VM design that enables a "phaseless" approach to program analysis, in contrast to the standard static-dynamic phase distinction. Phaseless analysis freely interposes abstract interpretation with concrete execution, allowing analyses to use dynamic (concrete) information to prove universal (abstract) properties about future execution. In this paper, we present a conceptual overview of OVV through a motivating example program that uses a hypothetical database library. We present a generic semantics for OVV, and an extension to this semantics that offers a simple gradual type system for the database library primitives. The result of instantiating this gradual type system in an OVV setting is a checker that can progressively type successive continuations of the program until a continuation is fully verified. To evaluate the proposed vision of OVV for this example, we implement the VM semantics (in Rust), and show that this design permits progressive typing in this manner.
Springer eBooks, 2002
We believe that fundamental to the establishment of a grid computing framework where all (not jus... more We believe that fundamental to the establishment of a grid computing framework where all (not just large organizations) are able to effectively tap into the resources available on the global network is the establishment of trust between grid application developers and resource donors. Resource donors must be able to trust that their security, safety, and privacy policies will be respected by programs that use their systems. In this paper, we present a novel solution based on the notion of certified code that upholds safety, security, and privacy policies by examining intrinsic properties of code. Certified code complements authentication and provides a foundation for a safe, secure, and efficient framework that executes native code. We describe the implementation of such a framework known as the ConCert software.
We present a novel approach to differential cost analysis that, given a program revision, attempt... more We present a novel approach to differential cost analysis that, given a program revision, attempts to statically bound the difference in resource usage, or cost, between the two program versions. Differential cost analysis is particularly interesting because of the many compelling applications for it, such as detecting resource-use regressions at code-review time or proving the absence of certain side-channel vulnerabilities. One prior approach to differential cost analysis is to apply relational reasoning that conceptually constructs a product program on which one can over-approximate the difference in costs between the two program versions. However, a significant challenge in any relational approach is effectively aligning the program versions to get precise results. In this paper, our key insight is that we can avoid the need for and the limitations of program alignment if, instead, we bound the difference of two cost-bound summaries rather than directly bounding the concrete cost difference. In particular, * This paper describes work performed in part while Ðorđe Žikelić was an Applied Scientist Intern at Amazon. 2 Bor-Yuh Evan Chang holds concurrent appointments at the University of Colorado Boulder and as an Amazon Scholar. This paper describes work performed at Amazon and is not associated with CU Boulder. 3 Franco Raimondi holds concurrent appointments at Middlesex University and as an Amazon Scholar. This paper describes work performed at Amazon and is not associated with Middlesex.
arXiv (Cornell University), Nov 16, 2015
This paper presents incremental verification-validation, a novel approach for checking rich data ... more This paper presents incremental verification-validation, a novel approach for checking rich data structure invariants expressed as separation logic assertions. Incremental verification-validation combines static verification of separation properties with efficient, shortcircuiting dynamic validation of arbitrarily rich data constraints. A data structure invariant checker is an inductive predicate in separation logic with an executable interpretation; a short-circuiting checker is an invariant checker that stops checking whenever it detects at run time that an assertion for some sub-structure has been fully proven statically. At a high level, our approach does two things: it statically proves the separation properties of data structure invariants using a static shape analysis in a standard way but then leverages this proof in a novel manner to synthesize short-circuiting dynamic validation of the data properties. As a consequence, we enable dynamic validation to make up for imprecision in sound static analysis while simultaneously leveraging the static verification to make the remaining dynamic validation efficient. We show empirically that short-circuiting can yield asymptotic improvements in dynamic validation, with low overhead over no validation, even in cases where static verification is incomplete.
Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation
We present a novel approach to differential cost analysis that, given a program revision, attempt... more We present a novel approach to differential cost analysis that, given a program revision, attempts to statically bound the difference in resource usage, or cost, between the two program versions. Differential cost analysis is particularly interesting because of the many compelling applications for it, such as detecting resource-use regressions at code-review time or proving the absence of certain side-channel vulnerabilities. One prior approach to differential cost analysis is to apply relational reasoning that conceptually constructs a product program on which one can over-approximate the difference in costs between the two program versions. However, a significant challenge in any relational approach is effectively aligning the program versions to get precise results. In this paper, our key insight is that we can avoid the need for and the limitations of program alignment if, instead, we bound the difference of two cost-bound summaries rather than directly bounding the concrete cost difference. In particular, * This paper describes work performed in part while Ðorđe Žikelić was an Applied Scientist Intern at Amazon. 2 Bor-Yuh Evan Chang holds concurrent appointments at the University of Colorado Boulder and as an Amazon Scholar. This paper describes work performed at Amazon and is not associated with CU Boulder. 3 Franco Raimondi holds concurrent appointments at Middlesex University and as an Amazon Scholar. This paper describes work performed at Amazon and is not associated with Middlesex.
A linear syntax for natural deduction proofs in first-order intuitionistic logic is presented, wh... more A linear syntax for natural deduction proofs in first-order intuitionistic logic is presented, which has been an effective tool for teaching logic. The proof checking algorithm is also given, which is the core of the tutorial proof checker Tutch. This syntax is then extended to proofs on the assertion level which resemble single inferences one would make in a rigorous proof. The resulting language has only four constructs. Checking of these proofs is decidable, and an efficient algorithm is given.
To make development of grid applications less arduous, a natural, powerful, and convenient progra... more To make development of grid applications less arduous, a natural, powerful, and convenient programming interface is required. First, we propose an expressive grid programming language which we hope will provide such an interface. Then we show how to map programs in this language onto a low-level, more compact architecture that can more easily provide the fault tolerance and inexpensive scheduling suit- able for grid computing. Finally, we discuss programming techniques for taking advantage of the underlying architecture, as well as issues to be resolved in future work.
We reexamine the foundations of linear logic, developing a system of natural deduction following ... more We reexamine the foundations of linear logic, developing a system of natural deduction following Martin-Löf's separation of judgments from propositions. Our construction yields a clean and elegant formulation that accounts for a rich set of multiplicative, additive, and exponential connectives, extending dual intuitionistic linear logic but differing from both classical linear logic and Hyland and de Paiva's full intuitionistic linear logic. We also provide a corresponding sequent calculus that admits a simple proof of the admissibility of cut by a single structural induction. Finally, we show how to interpret classical linear logic (with or without the MIX rule) in our system, employing a form of double-negation translation.
The aim of static analysis is to infer invariants about programs that are precise enough to estab... more The aim of static analysis is to infer invariants about programs that are precise enough to establish semantic properties, such as the absence of run-time errors. Broadly speaking, there are two major branches of static analysis for imperative programs. Pointer and shape analyses focus on inferring properties of pointers, dynamically-allocated memory, and recursive data structures, while numeric analyses seek to derive invariants on numeric values. Although simultaneous inference of shape-numeric invariants is often needed, this case is especially challenging and is not particularly well explored. Notably, simultaneous shape-numeric inference raises complex issues in the design of the static analyzer itself. In this paper, we study the construction of such shape-numeric, static analyzers. We set up an abstract interpretation framework that allows us to reason about simultaneous shape-numeric proper-ties by combining shape and numeric abstractions into a modular, expressive abstract ...
Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2021
We consider the problem of making expressive static analyzers interactive. Formal static analysis... more We consider the problem of making expressive static analyzers interactive. Formal static analysis is seeing increasingly widespread adoption as a tool for verification and bugfinding, but even with powerful cloud infrastructure it can take minutes or hours to get batch analysis results after a code change. While existing techniques offer some demanddriven or incremental aspects for certain classes of analysis, the fundamental challenge we tackle is doing both for arbitrary abstract interpreters. Our technique, demanded abstract interpretation, lifts program syntax and analysis state to a dynamically evolving graph structure, in which program edits, client-issued queries, and evaluation of abstract semantics are all treated uniformly. The key difficulty addressed by our approach is the application of general incremental computation techniques to the complex, cyclic dependency structure induced by abstract interpretation of loops with widening operators. We prove that desirable abstract interpretation meta-properties, including soundness and termination, are preserved in our approach, and that demanded analysis results are equal to those computed by a batch abstract interpretation. Experimental results suggest promise for a prototype demanded abstract interpretation framework: by combining incremental and demand-driven techniques, our framework consistently delivers analysis results at interactive speeds, answering 95% of queries within 1.2 seconds. CCS Concepts: • Theory of computation → Program analysis; • Software and its engineering → Formal software verification. * Bor-Yuh Evan Chang holds concurrent appointments at the University of Colorado Boulder and as an Amazon Scholar. This paper describes work performed at CU Boulder and is not associated with Amazon.
Proceedings of the ACM on Programming Languages, 2019
Static analysis tools for JavaScript must strike a delicate balance, achieving the level of preci... more Static analysis tools for JavaScript must strike a delicate balance, achieving the level of precision required by the most complex features of target programs without incurring prohibitively high analysis time. For example, reasoning about dynamic property accesses sometimes requires precise relational information connecting the object, the dynamically-computed property name, and the property value. Even a minor precision loss at such critical program locations can result in a proliferation of spurious dataflow that renders the analysis results useless. We present a technique by which a conventional non-relational static dataflow analysis can be combined soundly with a value refinement mechanism to increase precision on demand at critical locations. Crucially, our technique is able to incorporate relational information from the value refinement mechanism into the non-relational domain of the dataflow analysis. We demonstrate the feasibility of this approach by extending an existing ...
ACM SIGPLAN Notices, 2017
To infer complex structural invariants, shape analyses rely on expressive families of logical pro... more To infer complex structural invariants, shape analyses rely on expressive families of logical properties. Many such analyses manipulate abstract memory states that consist of separating conjunctions of basic predicates describing atomic blocks or summaries. Moreover, they use finite disjunctions of abstract memory states in order to account for dissimilar shapes. Disjunctions should be kept small for the sake of scalability, though precision often requires to keep additional case splits. In this context, deciding when and how to merge case splits and to replace them with summaries is critical both for the precision and for the efficiency. Existing techniques use sets of syntactic rules, which are tedious to design and prone to failure. In this paper, we design a semantic criterion to clump abstract states based on their silhouette which applies not only to the conservative union of disjuncts, but also to the weakening of separating conjunction of memory predicates into inductive sum...
Lecture Notes in Computer Science, 2013
Real-world data structures are often enhanced with additional pointers capturing alternative path... more Real-world data structures are often enhanced with additional pointers capturing alternative paths through a basic inductive skeleton (e.g., back pointers, head pointers). From the static analysis point of view, we must obtain several interlocking shape invariants. At the same time, it is well understood in abstract interpretation design that supporting a separation of concerns is critically important to designing powerful static analyses. Such a separation of concerns is often obtained via a reduced product on a case-by-case basis. In this paper, we lift this idea to abstract domains for shape analyses, introducing a domain combination operator for memory abstractions. As an example, we present simultaneous separating shape graphs, a product construction that combines instances of separation logic-based shape domains. The key enabler for this construction is a static analysis on inductive data structure definitions to derive relations between the skeleton and the alternative paths. From the engineering standpoint, this construction allows each component to reason independently about different aspects of the data structure invariant and then separately exchange information via a reduction operator. From the usability standpoint, we enable describing a data structure invariant in terms of several inductive definitions that hold simultaneously.
Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2014
Objective-C 76 r2 annotations on system libraries 6 libraries and 3 applications 136 annotations ... more Objective-C 76 r2 annotations on system libraries 6 libraries and 3 applications 136 annotations on benchmark code Manual type annotations 1,000 to 176,000 lines of code Plugin for clang static analyzer in C++ Prototype analysis implementation 9 Objective-C benchmarks Case Study: Reflection in Objective-C 76 r2 annotations on system libraries 6 libraries and 3 applications 136 annotations on benchmark code Manual type annotations
Electronic Proceedings in Theoretical Computer Science, 2013
The aim of static analysis is to infer invariants about programs that are precise enough to estab... more The aim of static analysis is to infer invariants about programs that are precise enough to establish semantic properties, such as the absence of run-time errors. Broadly speaking, there are two major branches of static analysis for imperative programs. Pointer and shape analyses focus on inferring properties of pointers, dynamically-allocated memory, and recursive data structures, while numeric analyses seek to derive invariants on numeric values. Although simultaneous inference of shapenumeric invariants is often needed, this case is especially challenging and is not particularly well explored. Notably, simultaneous shape-numeric inference raises complex issues in the design of the static analyzer itself. In this paper, we study the construction of such shape-numeric, static analyzers. We set up an abstract interpretation framework that allows us to reason about simultaneous shape-numeric properties by combining shape and numeric abstractions into a modular, expressive abstract domain. Such a modular structure is highly desirable to make its formalization and implementation easier to do and get correct. To achieve this, we choose a concrete semantics that can be abstracted step-by-step, while preserving a high level of expressiveness. The structure of abstract operations (i.e., transfer, join, and comparison) follows the structure of this semantics. The advantage of this construction is to divide the analyzer in modules and functors that implement abstractions of distinct features.
Lecture Notes in Computer Science, 2014
The breadth and depth of heap properties that can be inferred by the union of today's shape analy... more The breadth and depth of heap properties that can be inferred by the union of today's shape analyses is quite astounding. Yet, achieving scalability while supporting a wide range of complex data structures in a generic way remains a long-standing challenge. In this paper, we propose a way to side-step this issue by defining a generic abstract domain combinator for combining memory abstractions on disjoint regions. In essence, our abstract domain construction is to the separating conjunction in separation logic as the reduced product construction is to classical, non-separating conjunction. This approach eases the design of the analysis as memory abstract domains can be re-used by applying our separating conjunction domain combinator. And more importantly, this combinator enables an analysis designer to easily create a combined domain that applies computationally-expensive abstract domains only where it is required.
ACM SIGPLAN Notices, 2008
Shape analyses are concerned with precise abstractions of the heap to capture detailed structural... more Shape analyses are concerned with precise abstractions of the heap to capture detailed structural properties. To do so, they need to build and decompose summaries of disjoint memory regions. Unfortunately, many data structure invariants require relations be tracked across disjoint regions, such as intricate numerical data invariants or structural invariants concerning back and cross pointers. In this paper, we identify issues inherent to analyzing relational structures and design an abstract domain that is parameterized both by an abstract domain for pure data properties and by user-supplied specifications of the data structure invariants to check. Particularly, it supports hybrid invariants about shape and data and features a generic mechanism for materializing summaries at the beginning, middle, or end of inductive structures. Around this domain, we build a shape analysis whose interesting components include a pre-analysis on the user-supplied specifications that guides the abstra...
Static Analysis
Developer-supplied data structure specifications are important to shape analyses, as they tell th... more Developer-supplied data structure specifications are important to shape analyses, as they tell the analysis what information should be tracked in order to obtain the desired shape invariants. We observe that data structure checking code (e.g., used in testing or dynamic analysis) provides shape information that can also be used in static analysis. In this paper, we propose a lightweight, automatic shape analysis based on these developer-supplied structural invariant checkers. In particular, we set up a parametric abstract domain, which is instantiated with such checker specifications to summarize memory regions using both notions of complete and partial checker evaluations. The analysis then automatically derives a strategy for canonicalizing or weakening shape invariants.
Proceedings of the 2012 International Symposium on Software Testing and Analysis, 2012
A static analysis design is sufficient if it can prove the property of interest with an acceptabl... more A static analysis design is sufficient if it can prove the property of interest with an acceptable number of false alarms. Ultimately, the only way to confirm that an analysis design is sufficient is to implement it and run it on real-world programs. If the evaluation shows that the design is insufficient, the designer must return to the drawing board and repeat the process-wasting expensive implementation effort over and over again. In this paper, we make the observation that there is a minimal range of code needed to prove a property of interest under an ideal static analysis; we call such a range of code a validation scope. Armed with this observation, we create a dynamic measurement framework that quantifies validation scopes and thus enables designers to rule out insufficient designs at lower cost. A novel attribute of our framework is the ability to model aspects of static reasoning using dynamic execution measurements. To evaluate the flexibility of our framework, we instantiate it on an example property-null dereference errors-and measure validation scopes on real-world programs. We use a broad range of metrics that capture the difficulty of analyzing programs along varying dimensions. We also examine how validation scopes evolve as developers fix null dereference errors and as code matures. We find that bug fixes shorten validation scopes, that longer validation scopes are more likely to be buggy, and that overall validation scopes are remarkably stable as programs evolve.
Lecture Notes in Computer Science, 2010
Detailed memory models that expose individual fields are necessary to precisely analyze code that... more Detailed memory models that expose individual fields are necessary to precisely analyze code that makes use of low-level aspects such as, pointers to fields and untagged unions. Yet, higher-level representations that collect fields into records are often used because they are typically more convenient and efficient in modeling the program heap. In this paper, we present a shape graph representation of memory that exposes individual fields while largely retaining the convenience of an object-level model. This representation has a close connection to particular kinds of formulas in separation logic. Then, with this representation, we show how to extend the Xisa shape analyzer for low-level aspects, including pointers to fields, C-style nested structures and unions, malloc and free, and array values, with minimal changes to the core algorithms (e.g., materialization and summarization).