[llvm-dev] Contributing a new sanitizer for pointer casts (original) (raw)
Stephen Kell via llvm-dev llvm-dev at lists.llvm.org
Tue Apr 25 06:54:12 PDT 2017
- Previous message: [llvm-dev] Help reviewing TableGen document patch
- Next message: [llvm-dev] Contributing a new sanitizer for pointer casts
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi all,
Some of you might remember that at EuroLLVM last year in Barcelona, Chris Diamand and I gave a talk about Clang/libcrunch, a run-time checking system which can be thought of as another flavour of sanitizer. It checks pointer casts, using run-time type information. Roughly the check is that the pointer really points to an instance of the target type, though there are refinements to deal with various idioms violating that. <http://www.llvm.org/devmtg/2016-03/#presentation9>
(I dropped a mention of this in the recent TBAA sanitizer thread, but consensus was that on balance it's a different enough tool to want both.)
My current research funding has some room for tech transfer activity, so I've been spending some time on improving the code, with a hope of eventually contributing it to LLVM.
This mail is just to get a handle on two questions: how much interest is there in this, and what changes are most important in order to get something contributable?
The system is a bit complex, so let me give you an overview of how it currently works. (If you want full technical details, there are a couple of research papers you could read -- see the bottom.)
Instrumentation: this adds checks on (most) pointer casts, and in a few other places. It also does a little source-level analysis to dump information about allocation sites. We have both (my original) CIL and (Chris's) Clang/LLVM implementations of this. The Clang version is not too pretty at present: it uses -include'd inline helper functions written in C and shared with the CIL implementation. It also requires a bit of a hack to propagate certain type info (in uses of "sizeof") onwards to LLVM so it can be used in a data-flow analysis.
Hints from the programmer: these are necessary to declare allocation functions, besides standard ones (malloc etc.). This is currently done with an environment variable (LIBALLOCS_ALLOC_FNS) though I've thought of adding a command-line option too. These declarations have effect at both compile and link time.
Compiler wrapper and helper tools: currently a mixture of shell, Python and C++ helpers building on a pile of my own libraries (libdwarfpp, dwarfidl, liballocstool), for DWARF analysis and postprocessing. Roughly, these are responsible for generating and linking the run-time type information itself.
Type information. This is autogenerated uniqued / COMDAT'd instances of a moderately complex (but compact) C struct for each distinct data type. The model of type info (but not the representation) is somewhat DWARF-inspired.
Runtime. This is a preloadable shared library which does the dispatching of the checks. It also gets its hooks into various places to load type info as necessary, and to observe various kinds of allocation happening within the process. Again it builds on a pile of my other stuff (liballocs, which builds on trap-syscalls, mallochooks, libdlbind).
Currently, my plan in a nutshell is to eliminate the C inline helpers in favour of fully IR-level instrumentation, and also eliminate the compiler wrapper in favour of a gold plugin (and maybe a bit of help in the clang driver). This should result in a contributable diff that adds a new sanitizer option (currently "-fsanitize=crunch", but name negotiable :-). Binaries built this way will also require the gold plugin and runtime (both out-of-tree) to do useful checking.
I don't intend to port the runtime. Although in principle this could share code with the sanitizer runtimes, that's a lot of work and I don't have the resource to visit this right now... barring major rewrites, the runtime pretty much has to be GPL-licensed anyway, since it borrows code from glibc and Xen (for purposes I'm pretty sure are not covered by the sanitizer runtimes).
So my questions for you are whether this contribution would be welcome, and in particular any red lines about how to do instrumentation, how to factor everything, and how to deal with the external dependencies. As I currently envisage things, the gold plugin must live out-of-tree since it will require my libraries to build; I don't believe equivalent library support exists within LLVM. This being out-of-tree seems not a huge loss given that the runtime also will be.
Oh, and runtime support exists for x86-64/Linux only at the moment, though there is a bit of code for FreeBSD.
For the interested, here are the research papers I mentioned.
"Dynamically diagnosing run-time type errors in unsafe code" (OOPSLA '16) http://www.cl.cam.ac.uk/~srk31/#oopsla16a
"Towards a dynamic object model within Unix processes" (Onward! '15) http://www.cl.cam.ac.uk/~srk31/#onward15
Code: <https://github.com/stephenrkell/liballocs> <https://github.com/stephenrkell/libcrunch> <https://github.com/stephenrkell/clangcrunch>.
All thoughts appreciated... let me know if you see any obstacles to contribution, or if you're able to help, or just if you have questions. Much obliged,
Stephen.
- Previous message: [llvm-dev] Help reviewing TableGen document patch
- Next message: [llvm-dev] Contributing a new sanitizer for pointer casts
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]