Oink: a Collaboration of C/C++ Tools for Static Analysis and Source-to-Source Transformation (original) (raw)
Oink:
CQual++:
Maintainer: Daniel S. Wilkerson
Developer: Karl Chen
Developer: Scott McPeak
Repository: http://github.com/dsw/oink-stack/
This documentation written by Daniel S. Wilkerson.
13 October 2014: After the Heartbleed bug came out, someone at a government lab that will not let me use their name wrote me (initially on 18 April 2014), saying:
Yes, you are interpreting me correctly. CQual++ found heartbleed while the commercial tools I tried did not. I also applied CQual++ to an important internal project and found it very effective (though a bit difficult to run and interpret) at identifying places where sanitization routines weren't being called consistently.
and
I've been applying cqual to a fairly significant library that we've been developing that does a fair amount of both network and filesystem related stuff. I don't think I've hit any false positives. There were a number of places where input was being validated but not marked as such which of course caused issue reports but that's not really a false positive.
16 February 2011: Bison 2.4.1 does not work; use bison 2.3.
3 May 2010: The documentation for Oink no longer lives at cubewano.org; it now lives at dsw.users.sonic.net/oink/index.html.
12 October 2009: Mozilla's bug fixes to Elsa to date have been merged in and Mozilla's tool Pork for using Elsa to do source-to-source transformation on C and C++ has been ported to Oink.
8 October 2009: Elsa and Oink are under development again.
9 August 2009: A few different groups of grad students at Cal Berkeley are using Elsa/Oink for projects now so I have recently done a cleanup of the code and this online documentation. Please note that we moved the source repository to github, as mentioned below; please get an up-to-date version of the source from github.
8 November 2008: Please note that Elsa and Oink are still maintained; however they are rather in a state of suspended animation. That is, none of us need any changes in them right now, so it is fine with us to leave them as they are. Mozilla, in particular Taras Glek, uses Elsa and tools they have written for Elsa to refactor their codebase. Mozilla has their own fork of the project; see Taras's Blog. Scott now works at Coverity and seems to occasionally check in fixes. If you need my (Wilkerson) attention, get my email from the image on my homepage.
Welcome to Oink!
Contents:
- Introduction to Oink (this page)
- OinkHelp: Annotated oink help output - the output of ./oink --help annotated with further details on their meaning.
- QualQuickStart: CQual++ quick-start instructions - The shortest path to findings bugs in your code.
- Qual: CQual++ Documentation - Documentation on the main Oink tool.
- QualHelp: Annotated Qual help output - The output of ./qual --help (minus the oink --help output) annotated with further details on their meaning.
- OinkManifest: A top-down view of the aspects of Oink, which files participate in each aspect, and documentation on each file.
- OinkCodingStyle - The Oink coding style; also provided is some elisp for putting emacs into the right mode.
- ElsaFeatures - The Elsa documentation is extensive; this documents a few useful features that users of Oink/Cqual++ should know about.
- Ideas for future features of Oink - If you are interested in working on Oink, here are some really non-trivial features we would like implemented.
Oink is released under the BSD license: see License for the the copyright and terms of use of the software in oink-stack.
To check out oink-stack, install git, then go to the oink-stack github page and follow instructions: http://github.com/dsw/oink-stack/
Introduction
Welcome to Oink! Oink is a collaboration of C++ static analysis tools. The C/C++ front-end for Oink is Elsa by Scott McPeak. Currently the main tool provided by Oink is CQual++, a polymorphic whole-program dataflow analysis for C++. CQual++ was inspired by Jeff Foster's Cqual tool and shares the backend solver with it.
Oink aims to be
- industrial-strength for immediate utility in finding bugs,
- extensible for ease in adding backends, and
- composable for ease in combining existing backends.
Oink computes both
- expression-level and type-level dataflow, and
- statement-level intra-procedural controlflow (by delegating to Elsa)
It easy to get started by using the two demo backends that print graphs of these flows. Oink also comes with a client of the dataflow analysis that does type qualifier inference: Cqual++, a C/C++ frontend for Cqual Whole-program analyses may be attempted using the linker imitator.
History and previous work
The primary tool in Oink is Cqual++. Cqual was the original tool that lead to the development of Oink/Cqual++. Cqual++ is not really a fork or a second version of Cqual as everything was completely re-written; however the design was informed by the close connection with the Cqual developers and we appreciate their co-operation.
The oink front-end, Elsa, is completely original work by Scott McPeak, whereas the Cqual front-end was a version of gcc 2.8.1 hacked by David Gay. The qualifier dataflow backend, libqual, is a polymorphic re-implementation by Rob Johnson of Jeff Foster monomorphic backend. The oink dataflow analysis was written by Daniel S. Wilkerson and is separate from the libqual backend so that it can be re-used. Only David Gay's libregion library was re-used as-is. Significant testing, bug fixes, and optimizations and generalizations were made by Karl Chen starting late February 2006 making Oink much more industrial strength.
The original work on Cqual was supported by Alex Aiken. The work on Oink and Cqual++ was supported by Alex Aiken and David Wagner. Without their ongoing support the project would not have happened.
- Delta assists you in minimizing "interesting" files subject to a test of their interestingness. A common such situation is when attempting to isolate a small failure-inducing substring of a large input that causes your program to exhibit a bug.
- Build Interceptor captures the .i files of any project while it is built from source using the gcc tool-chain.
- Compound Init is a small package for implementing the semantics of Compound Initializer syntax as described in the C99 standard, section 6.7.8, including Designated Initializers and the gcc Range Initializer extension to the syntax. That is, Compound Init maps from 1) a type tree and 2) a compound initializer abstract syntax tree to a sequence of pairs of leaf types and leaf initializers. Compound Init is designed for maximum ease of inclusion into an existing compiler. It was a separate project, but is now just the LibCpdInit subdirectory of oink.
Articles
The following articles used the Oink infrastructure. There is no Oink article to cite (yet?), which is why you may not see a citation for it.
- Large-Scale Analysis of Format String Vulnerabilities in Debian Linux. Karl Chen and David Wagner. In Proceedings of the ACM SIGPLAN Workshop on Programming Languages and Analysis for Security (PLAS 2007), June 14, 2007.
- Preventing Secret Leakage from fork(): Securing Privilege-Separated Applications. Umesh Shankar and David Wagner. In Proceedings of the 2006 IEEE International Conference on Communications (Network Security and Information Assurance Symposium at ICC 2006), June 2006.
- Scrash: A System for Generating Secure Crash Information. Pete Broadwell, Matthew Harren, and Naveen Sastry. In Proceedings of the 12th USENIX Security Symposium, August 2003.
Talks
- Elsa/Oink/Cqual++ CodeCon 2006 (ppt) - slides on Elsa/Oink/Cqual++ for talk given by Scott McPeak and Daniel Wilkerson at CodeCon 2006.
- OSQ 2005-01-28 slides (PS) - slides for talk at OSQ lunch on 2005-01-28 using Oink and other tools
- Taras Glek, September 10, 2007: Automatic Decomtamination: Roadmap for Automated Refactorings: Mozilla using Elsa to do automated refactoring of their codebase.
- Mozilla Static Analysis page: Applications for Oink static analysis tools for Mozilla 2. Note that this work is suspended.
- Brendan Eich, October 13, 2006: Brendan's Roadmap Updates: Mozilla 2
- Robert O'Callahan, September 14, 2006: Well, I'm Back: Static Analysis And Scary Headlines
The Oink-Stack repository
You need an entire collection of repositories in order to build oink. Oink-stack is a meta repository that contains git submodule links to the individual repositories below, and scripts to use them.
- smbase: Scott McPeak's replacement for the C++ container and string libraries.
- ast: Scott McPeak's tool for generating AST classes from a specification language.
- elkhound: Scott McPeak's tool for generating a parser for a GLR grammar.
- elsa: Scott McPeak's C and C++ front-end.
- libregion: David Gay's C region-based memory management library.
- libqual: Rob Johnson's serializable polymorphic type qualifier inference engine based on Jeff Foster's original monomorphic type qualifier inference engine, cqual.
- platform-model: Karl Chen's static model of libc and libstdc++.
- oink: Daniel S. Wilkerson's join-point for front-end and back-end coordination. Client applications are built in this directory. In particular, contains Cqual++.
More generally, the Oink-Stack repository is what we call a "software stack": it is a collection of repositories that together form one large project of composable parts. The name is reminiscent of the TCP/IP networking stack, which we think of as being organized this way.
Getting the code
To check out oink-stack, install git, then go to the oink-stack github page and follow instructions: http://github.com/dsw/oink-stack/
Building
See above for dependencies. To build everything, in oink-stack/ type:
./configure && make clean all check
Upon successful build, you should see
================================================================ DONE in all directories: check
The oink-stack repository has a meta-configure file that will allow you to specify a configuration "mode" for the whole project. When it is run all of the sub-repository configure scripts are then called with arguments such that the whole project is built in one way consistent with that mode.
- The default mode is fastdebug, which is what you want for debugging: the resulting code is reasonably fast and can still be run in gdb.
- To build for performance, use the perform mode.
There are other modes; for an exhaustive and a usage message, in oink-stack run ./configure -h
If you have problems with the platform-model tests failing with a segmentation fault, this seems to often be due to problems with libzipios++ and zlib. Make sure that your version of these libraries was built with the same version of the compiler that you are building Oink with. Another solution is to just turn off zip archiving, falling back to a more primitive method. Do this from the oink-stack directory as follows.
./configure +platform-model:--archive-srz=dir +oink:-require-no-archive-srz-zip
Licensing
Oink is released under the BSD license: see License for the the copyright and terms of use of the software in oink-stack.
Contributing
Your act of submitting software or a software patch to us constitutes your agreement to us that you have the legal right to release it into the public domain and that you have released it into the public domain.
Dependencies
Dependencies required to compile Oink:
- gcc: 3.2, 3.4, 4.0, 4.1 known to work. gcc 2.95.3 no longer works because our C code is no longer C89 compatible.
- flex: 2.5.4, 2.5.31, 2.5.33 known to work.
- bison: 1.35, 2.1, 2.3 known to work. NOTE: bison 2.4.1 known to not work.
The oink-stack/platform-model repository requires the following:
- python2.4
The following are optional; you get extended functionality if you have them.
- dot: 1.8.10, 2.7 known to work; Get dot from here: http://www.graphviz.org/ . You do not need dot for the base functionality of the system. If we have an internal graph to print out, such as the output of -fq-print-quals-graph, it will be rendered out in dot format and you will need dot to convert that to postscript.
- gcc-3.4 to use platform-model for C++ (Elsa does not support STL headers from g++ 4.0 or g++ 3.3; please see platform-model/doc/gcc.txt.
- libzipios++ and zlib: libzipios++ 0.1.5.9 known to work, zlib 1.2.3 known to work. Enable loading and saving .qz archive files, in addition to .qdir archive directories.
Recently we changed the default configuration so that the oink-stack/platform-model repository is *not* built by default. Therefore the out-of-the-box build should succeed on any Unix-like platform. If you do not want to use platform-model then you will have to provide your own annotations for the platform libraries. If the parts of your code that talk to external libraries are small, you can annotate them instead and you might get along just fine. You will also have to use raw oink-stack/oink/qual instead of the wrapper script oink-stack/oink-scripts/bin/qualx.
Oink is a framework in which to build tools. To make Oink easy to learn and build-upon, I have written some small examples. Each sample tool is a stand-alone executable and does not share infrastructure with other unrelated tools, making it smaller and easier to understand. Oink also provides the Cqual++ tool, a polymorphic whole-program dataflow analysis for C++.
Oink tool
The Oink tool, oink/oink, is the empty analysis: it does nothing. Its features and command-line flags are shared by all other tools. For an exhaustive list of the oink/oink flags and a short description of each, just type: ./oink --help
Static Print tool
The Static Print tool, oink/staticprint, is intended to print out various facts about a program that can be obtained statically. Currently it prints the inheritance tree and a histogram of AST nodes. Run 'make staticprint-check-print' and then view Test/staticprint1.cc.ihg.dot.ps in a postscript viewer and compare with the source code in Test/staticprint1.cc. See the staticprint_test.incl.mk makefile for other examples. Note that currently this tree is translation-unit-local since staticprint has no linker imitator.
Dfg Print tool
The Dfg Print tool, oink/dfgprint, prints the data flow graph of a C++ program. It is purposefully small and simple so it is easy to see the essentials of writing a dataflow tool. It should be easy to get started on a dataflow analysis by simply copying dfgprint and hacking it. Run 'make dfgprint-check-print' and then view Test/dfgprint1.c.dfg.dot.ps in a postscript viewer and compare with the source code in Test/dfgprint1.c . Note that the graph generation requires that you have the graph layout program 'dot'.
Note that dfgprint prints 'expression-level' dataflow. The dataflow used by the Cqual++ analysis (described below) is the more detailed 'type-level' flow; you can get a graph of that flow by running Cqual++ with the flag -fprint-quals-graph and then running 'dot' on the resulting quals.dot file.
Cfg Print tool
The Cfg Print tool, oink/cfgprint, prints the intra-procedural control flow graph of a C++ program (by delegating to that functionality in Elsa). Similarly to dfgprint it also is purposefully small and simple. Run 'make cfgprint-check-print' and then view Test/cfgprint1.c.cfg.dot.ps in a postscript viewer and compare with the source code in Test/cfgprint1.c .
Note that cfgprint prints a statement-level control flow graph; a more precise expression-level graph could be implemented as (mostly) a refinement of the statement-level flow, but this is not done yet. (It would not be a strict refinement due to the possibility of using a goto to jump out of a gnu expression statement; Yes there is an example of this in the Red Hat 9 source tree.) We don't compute the (inter-procedural) call graph as to do that properly really requires including also a points-to analysis and a dataflow analysis.
Cqual++ tool
The major tool of the current Oink distribution is Cqual++, oink/qual, which checks static assertions about C++ programs that can be expressed as a polymorphic dataflow of type qualifiers. Cqual++ uses libqual to do the inference. Please see Qual for more information.