Oink: a Collaboration of C/C++ Tools for Static Analysis and Source-to-Source Transformation (original) (raw)

Repository: http://github.com/dsw/oink-stack/


This documentation written by Daniel S. Wilkerson.

13 October 2014: After the Heartbleed bug came out, someone at a government lab that will not let me use their name wrote me (initially on 18 April 2014), saying:

Yes, you are interpreting me correctly. CQual++ found heartbleed while the commercial tools I tried did not. I also applied CQual++ to an important internal project and found it very effective (though a bit difficult to run and interpret) at identifying places where sanitization routines weren't being called consistently.

and

I've been applying cqual to a fairly significant library that we've been developing that does a fair amount of both network and filesystem related stuff. I don't think I've hit any false positives. There were a number of places where input was being validated but not marked as such which of course caused issue reports but that's not really a false positive.

16 February 2011: Bison 2.4.1 does not work; use bison 2.3.

3 May 2010: The documentation for Oink no longer lives at cubewano.org; it now lives at dsw.users.sonic.net/oink/index.html.

12 October 2009: Mozilla's bug fixes to Elsa to date have been merged in and Mozilla's tool Pork for using Elsa to do source-to-source transformation on C and C++ has been ported to Oink.

8 October 2009: Elsa and Oink are under development again.

9 August 2009: A few different groups of grad students at Cal Berkeley are using Elsa/Oink for projects now so I have recently done a cleanup of the code and this online documentation. Please note that we moved the source repository to github, as mentioned below; please get an up-to-date version of the source from github.

8 November 2008: Please note that Elsa and Oink are still maintained; however they are rather in a state of suspended animation. That is, none of us need any changes in them right now, so it is fine with us to leave them as they are. Mozilla, in particular Taras Glek, uses Elsa and tools they have written for Elsa to refactor their codebase. Mozilla has their own fork of the project; see Taras's Blog. Scott now works at Coverity and seems to occasionally check in fixes. If you need my (Wilkerson) attention, get my email from the image on my homepage.


Welcome to Oink!

Contents:

Oink is released under the BSD license: see License for the the copyright and terms of use of the software in oink-stack.

To check out oink-stack, install git, then go to the oink-stack github page and follow instructions: http://github.com/dsw/oink-stack/

Introduction

Welcome to Oink! Oink is a collaboration of C++ static analysis tools. The C/C++ front-end for Oink is Elsa by Scott McPeak. Currently the main tool provided by Oink is CQual++, a polymorphic whole-program dataflow analysis for C++. CQual++ was inspired by Jeff Foster's Cqual tool and shares the backend solver with it.

Oink aims to be

  1. industrial-strength for immediate utility in finding bugs,
  2. extensible for ease in adding backends, and
  3. composable for ease in combining existing backends.

Oink computes both

  1. expression-level and type-level dataflow, and
  2. statement-level intra-procedural controlflow (by delegating to Elsa)

It easy to get started by using the two demo backends that print graphs of these flows. Oink also comes with a client of the dataflow analysis that does type qualifier inference: Cqual++, a C/C++ frontend for Cqual Whole-program analyses may be attempted using the linker imitator.

History and previous work

The primary tool in Oink is Cqual++. Cqual was the original tool that lead to the development of Oink/Cqual++. Cqual++ is not really a fork or a second version of Cqual as everything was completely re-written; however the design was informed by the close connection with the Cqual developers and we appreciate their co-operation.

The oink front-end, Elsa, is completely original work by Scott McPeak, whereas the Cqual front-end was a version of gcc 2.8.1 hacked by David Gay. The qualifier dataflow backend, libqual, is a polymorphic re-implementation by Rob Johnson of Jeff Foster monomorphic backend. The oink dataflow analysis was written by Daniel S. Wilkerson and is separate from the libqual backend so that it can be re-used. Only David Gay's libregion library was re-used as-is. Significant testing, bug fixes, and optimizations and generalizations were made by Karl Chen starting late February 2006 making Oink much more industrial strength.

The original work on Cqual was supported by Alex Aiken. The work on Oink and Cqual++ was supported by Alex Aiken and David Wagner. Without their ongoing support the project would not have happened.

Articles

The following articles used the Oink infrastructure. There is no Oink article to cite (yet?), which is why you may not see a citation for it.

Talks


The Oink-Stack repository

You need an entire collection of repositories in order to build oink. Oink-stack is a meta repository that contains git submodule links to the individual repositories below, and scripts to use them.

More generally, the Oink-Stack repository is what we call a "software stack": it is a collection of repositories that together form one large project of composable parts. The name is reminiscent of the TCP/IP networking stack, which we think of as being organized this way.

Getting the code

To check out oink-stack, install git, then go to the oink-stack github page and follow instructions: http://github.com/dsw/oink-stack/

Building

See above for dependencies. To build everything, in oink-stack/ type:

./configure && make clean all check

Upon successful build, you should see

================================================================ DONE in all directories: check

The oink-stack repository has a meta-configure file that will allow you to specify a configuration "mode" for the whole project. When it is run all of the sub-repository configure scripts are then called with arguments such that the whole project is built in one way consistent with that mode.

There are other modes; for an exhaustive and a usage message, in oink-stack run ./configure -h

If you have problems with the platform-model tests failing with a segmentation fault, this seems to often be due to problems with libzipios++ and zlib. Make sure that your version of these libraries was built with the same version of the compiler that you are building Oink with. Another solution is to just turn off zip archiving, falling back to a more primitive method. Do this from the oink-stack directory as follows.

./configure +platform-model:--archive-srz=dir +oink:-require-no-archive-srz-zip


Licensing

Oink is released under the BSD license: see License for the the copyright and terms of use of the software in oink-stack.

Contributing

Your act of submitting software or a software patch to us constitutes your agreement to us that you have the legal right to release it into the public domain and that you have released it into the public domain.

Dependencies

Dependencies required to compile Oink:

The oink-stack/platform-model repository requires the following:

The following are optional; you get extended functionality if you have them.

Recently we changed the default configuration so that the oink-stack/platform-model repository is *not* built by default. Therefore the out-of-the-box build should succeed on any Unix-like platform. If you do not want to use platform-model then you will have to provide your own annotations for the platform libraries. If the parts of your code that talk to external libraries are small, you can annotate them instead and you might get along just fine. You will also have to use raw oink-stack/oink/qual instead of the wrapper script oink-stack/oink-scripts/bin/qualx.

Oink is a framework in which to build tools. To make Oink easy to learn and build-upon, I have written some small examples. Each sample tool is a stand-alone executable and does not share infrastructure with other unrelated tools, making it smaller and easier to understand. Oink also provides the Cqual++ tool, a polymorphic whole-program dataflow analysis for C++.

Oink tool

The Oink tool, oink/oink, is the empty analysis: it does nothing. Its features and command-line flags are shared by all other tools. For an exhaustive list of the oink/oink flags and a short description of each, just type: ./oink --help

Static Print tool

The Static Print tool, oink/staticprint, is intended to print out various facts about a program that can be obtained statically. Currently it prints the inheritance tree and a histogram of AST nodes. Run 'make staticprint-check-print' and then view Test/staticprint1.cc.ihg.dot.ps in a postscript viewer and compare with the source code in Test/staticprint1.cc. See the staticprint_test.incl.mk makefile for other examples. Note that currently this tree is translation-unit-local since staticprint has no linker imitator.

Dfg Print tool

The Dfg Print tool, oink/dfgprint, prints the data flow graph of a C++ program. It is purposefully small and simple so it is easy to see the essentials of writing a dataflow tool. It should be easy to get started on a dataflow analysis by simply copying dfgprint and hacking it. Run 'make dfgprint-check-print' and then view Test/dfgprint1.c.dfg.dot.ps in a postscript viewer and compare with the source code in Test/dfgprint1.c . Note that the graph generation requires that you have the graph layout program 'dot'.

Note that dfgprint prints 'expression-level' dataflow. The dataflow used by the Cqual++ analysis (described below) is the more detailed 'type-level' flow; you can get a graph of that flow by running Cqual++ with the flag -fprint-quals-graph and then running 'dot' on the resulting quals.dot file.

Cfg Print tool

The Cfg Print tool, oink/cfgprint, prints the intra-procedural control flow graph of a C++ program (by delegating to that functionality in Elsa). Similarly to dfgprint it also is purposefully small and simple. Run 'make cfgprint-check-print' and then view Test/cfgprint1.c.cfg.dot.ps in a postscript viewer and compare with the source code in Test/cfgprint1.c .

Note that cfgprint prints a statement-level control flow graph; a more precise expression-level graph could be implemented as (mostly) a refinement of the statement-level flow, but this is not done yet. (It would not be a strict refinement due to the possibility of using a goto to jump out of a gnu expression statement; Yes there is an example of this in the Red Hat 9 source tree.) We don't compute the (inter-procedural) call graph as to do that properly really requires including also a points-to analysis and a dataflow analysis.

Cqual++ tool

The major tool of the current Oink distribution is Cqual++, oink/qual, which checks static assertions about C++ programs that can be expressed as a polymorphic dataflow of type qualifiers. Cqual++ uses libqual to do the inference. Please see Qual for more information.