Constant Interpreter — Clang 22.0.0git documentation (original) (raw)

Introduction

The constexpr interpreter aims to replace the existing tree evaluator in clang, improving performance on constructs which are executed inefficiently by the evaluator. The interpreter is activated using the following flags:

Bytecode Compilation

Bytecode compilation is handled in Compiler.h for statements and for expressions. The compiler has two different backends: one to generate bytecode for functions (ByteCodeEmitter) and one to directly evaluate expressions as they are compiled, without generating bytecode (EvalEmitter). All functions are compiled to bytecode, while toplevel expressions used in constant contexts are directly evaluated since the bytecode would never be reused. This mechanism aims to pave the way towards replacing the evaluator, improving its performance on functions and loops, while being just as fast on single-use toplevel expressions.

The interpreter relies on stack-based, strongly-typed opcodes. The glue logic between the code generator, along with the enumeration and description of opcodes, can be found in Opcodes.td. The opcodes are implemented as generic template methods in Interp.h and instantiated with the relevant primitive types by the interpreter loop or by the evaluating emitter.

Primitive Types

Composite types

The interpreter distinguishes two kinds of composite types: arrays and records (structs and classes). Unions are represented as records, except at most a single field can be marked as active. The contents of inactive fields are kept until they are reactivated and overwritten. Complex numbers (_Complex) and vectors (__attribute((vector_size(16)))) are treated as arrays.

Bytecode Execution

Bytecode is executed using a stack-based interpreter. The execution context consists of an InterpStack, along with a chain ofInterpFrame objects storing the call frames. Frames are built by call instructions and destroyed by return instructions. They perform one allocation to reserve space for all locals in a single block. These objects store all the required information to emit stack traces whenever evaluation fails.

Memory Organisation

Memory management in the interpreter relies on 3 data structures: Blockobjects which store the data and associated inline metadata, Pointerobjects which refer to or into blocks, and Descriptor structures which describe blocks and subobjects nested inside blocks.

Blocks

Blocks contain data interleaved with metadata. They are allocated either statically in the code generator (globals, static members, dummy parameter values etc.) or dynamically in the interpreter, when creating the frame containing the local variables of a function. Blocks are associated with a descriptor that characterises the entire allocation, along with a few additional attributes:

Static blocks are never deallocated, but local ones might be deallocated even when there are live pointers to them. Pointers are only valid as long as the blocks they point to are valid, so a block with pointers to it whose lifetime ends is kept alive until all pointers to it go out of scope. Since the frame is destroyed on function exit, such blocks are turned into a DeadBlock and copied to storage managed by the interpreter itself, not the frame. Reads and writes to these blocks are illegal and cause an appropriate diagnostic to be emitted. When the last pointer goes out of scope, dead blocks are also deallocated.

The lifetime of blocks is managed through 3 methods stored in the descriptor of the block:

Non-static blocks track all the pointers into them through an intrusive doubly-linked list, required to adjust and invalidate all pointers when transforming a block into a dead block. If the lifetime of an object ends, all pointers to it are invalidated, emitting the appropriate diagnostics when dereferenced.

The interpreter distinguishes 3 different kinds of blocks:

Inline descriptors are filled in by the CtorFn of blocks, which leaves storage in an uninitialised, but valid state.

Descriptors

Descriptors are generated at bytecode compilation time and contain information required to determine if a particular memory access is allowed in constexpr. They also carry all the information required to emit a diagnostic involving a memory access, such as the declaration which originates the block. Currently there is a single kind of descriptor encoding information for all block types.

Pointers

Pointers, implemented in Pointer.h are represented as a tagged union.

Besides the previously mentioned union, a number of other pointer-like types have their own type:

BlockPointer

Block pointers track a Pointee, the block to which they point, along with a Base and an Offset. The base identifies the innermost field, while the offset points to an array element relative to the base (including one-past-end pointers). The offset identifies the array element or field which is referenced, while the base points to the outer object or array which contains the field. These two fields allow all pointers to be uniquely identified, disambiguated and characterised.

As an example, consider the following structure:

struct A { struct B { int x; int y; } b; struct C { int a; int b; } c[2]; int z; }; constexpr A a;

On the target, &a and &a.b.x are equal. So are &a.c[0] and&a.c[0].a. In the interpreter, all these pointers must be distinguished since the are all allowed to address distinct range of memory.

In the interpreter, the object would require 240 bytes of storage and would have its field interleaved with metadata. The pointers which can be derived to the object are illustrated in the following diagram:

0   16  32  40  56  64  80  96  112 120 136 144 160 176 184 200 208 224 240

+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

The Base offset of all pointers points to the start of a field or an array and is preceded by an inline descriptor (unless Base is zero, pointing to the root). All the relevant attributes can be read from either the inline descriptor or the descriptor of the block.

Array elements are identified by the Offset field of pointers, pointing to past the inline descriptors for composites and before the actual data in the case of primitive arrays. The Offsetpoints to the offset where primitives can be read from. As an example,a.c + 1 would have the same base as a.c since it is an element of a.c, but its offset would point to &a.c[1]. The array-to-pointer decay operation adjusts a pointer to an array (where the offset is equal to the base) to a pointer to the first element.

TypeInfoPointer

TypeInfoPointer tracks two types: the type assigned tostd::type_info and the type which was passed to typeinfo. It is part of the tagged union in Pointer.