[RFC] Introducing Instrumentor: Easily Customizable Code Instrumentation (original) (raw)

Code instrumentation is a widely used technique for tracking applications’ behavior. Debugging, sanitization, logging of events, and performance analysis are some common uses. Typically, instrumenting a program involves modifying its original code by inserting extra code to retrieve information, which is forwarded to a runtime component (e.g., a library) that collects the data for online or offline processing.

LLVM, like other compiler infrastructures, lacks a generic mechanism to instrument code. Numerous LLVM passes use custom logic for slightly different, but generally similar instrumentation (e.g., the sanitizers). This situation leads to poor code maintainability, increased code duplication, and, worse yet, hinders the development of new instrumentation-based tools.

Instrumentor

We introduce the Instrumentor, a new LLVM pass that allows instrumenting code in a simple and customizable way. We want to add our pass to the LLVM project so that developers and users can start building their instrumentation-based tools using this generic solution, rather than developing custom instrumentation passes from scratch. The addition of this pass does not alter any existing component of the LLVM project.

The Instrumentor can be used from within LLVM, as any other pass or component could be, or from any frontend, via a descriptive JSON user interface. The JSON configuration file defines which instrumentation opportunities, e.g., instruction kinds, should be instrumented, and what information is to be forwarded. The pass then inserts instrumentation function calls and passes all the requested information. A runtime component is needed to implement all instrumentation functions and to act on the forwarded information. The Instrumentor may also be customized programmatically; any other LLVM pass can utilize the Instrumentor pass and all the encapsulated instrumentation opportunities to fine-tune its behavior.

The Instrumentor aims to provide a unified and simple method for instrumenting code, reducing maintainability costs and code replication, as well as paving the path for future instrumentation-based tools. The pass has been designed to be easily extensible to support other instrumentation opportunities in the future.

Instrumenting using the JSON file

The simplest way to instrument code with the Instrumentor is by providing a JSON configuration file. This way requires no code modification in the compiler pipeline. We use the LLVM IR below to show how this approach works.

i32 myfunc(ptr %p) {
  %v = load i32, ptr %p, align 8
  store i32 10, ptr %p, align 8 
  ret i32 %v
}

In this example, we want to instrument load operations just before they are executed and provide related information to the runtime component. To this end, we supply the Instrumentor with the following JSON configuration file. This instructs the Instrumentor to insert instrumentation calls (with the __instr_ prefix) just before load operations and forward the following information: the pointer operand, the access size, the alignment, and whether it is volatile. Additionally, the Instrumentor is instructed to replace the pointer operand of the load with the pointer provided by the runtime component.

{
  "configuration": {
    "runtime_prefix": "__instr_",
  },
  "instruction_pre": {
    "load": {
      "enabled": true,
      "pointer": true,
      "pointer.replace": true,
      "pointer_as": false,
      "value_size": true,
      "alignment": true,
      "is_volatile": true
    }
  }
}

The resulting code after the Instrumentor pass is shown below. The instrumentation call forwards the requested information about the load before it is executed, and the load instruction is modified to take the new pointer %np (provided by the runtime component) as the pointer operand.

i32 myfunc(ptr %p) {
  %np = call ptr @__instr_pre_load(ptr %p, i32 4, i32 8, i32 0)
  %v = load i32, ptr %np, align 8
  store i32 10, ptr %p, align 8
  ret i32 %v
}

Instrumenting from another LLVM pass

Running the Instrumentor from another LLVM pass is an alternative that provides finer control of which opportunities should be instrumented. The code below, which would be in an LLVM pass, configures the instrumentation of load operations with the same options as in the previous JSON file. The CB is an optional callback to decide which loads should be instrumented or not.

LoadIO::ConfigTy LConfig(/*Enable=*/false);
LConfig.PassPointer = true;
LConfig.ReplacePointer = true;
LConfig.PassValueSize = true;
LConfig.PassAlignment = true;
LConfig.PassIsVolatile = true;

LoadIO *LIO = new LoadIO(/*IsPRE=*/true);
LIO->CB = [&](Value &V) {
  return shouldInstrumentLoad(cast<LoadInst>(V));
};
LIO->init(&LConfig);

Summary of Instrumentor features

Multiple instrumentation opportunities: instructions (loads, stores, allocas, calls, etc.), function enter/exit, global variables, and module constructor/dtor.
Optional replacement of certain operands with runtime-provided values (e.g., the pointer operand in stores).
Instrumentation of function calls and function enter/exit allows the inspection of function arguments and the replacement of their values.
Automatic generation of a stub runtime source file for any instrumentation configuration to simplify the development of new tools.
Optional inlining of the runtime implementation (i.e., a bitcode file) into the instrumented user code to avoid instrumentation calls.
Using the Instrumentor programmatically allows finer instrumentation control, such as skipping the instrumentation of opportunities using a filtering callback or passing custom information to instrumentation functions.

We would like to know any feedback regarding this new generic instrumentation pass. Please feel free to ask questions or comment interesting features that could be provided by the Instrumentor.

Authors / contributors of the Instrumentor: Johannes Doerfert (@jdoerfert), Kevin Sala, Ivan Ivanov (@ivanradanov), Ethan Luis McDonough (@ethanluismcdonough).