Solidity IR-based Codegen Changes — Solidity 0.8.31 documentation (original) (raw)

Solidity can generate EVM bytecode in two different ways: Either directly from Solidity to EVM opcodes (“old codegen”) or through an intermediate representation (“IR”) in Yul (“new codegen” or “IR-based codegen”).

The IR-based code generator was introduced with an aim to not only allow code generation to be more transparent and auditable but also to enable more powerful optimization passes that span across functions.

You can enable it on the command-line using --via-iror with the option {"viaIR": true} in standard-json and we encourage everyone to try it out!

For several reasons, there are tiny semantic differences between the old and the IR-based code generator, mostly in areas where we would not expect people to rely on this behavior anyway. This section highlights the main differences between the old and the IR-based codegen.

Semantic Only Changes

This section lists the changes that are semantic-only, thus potentially hiding new and different behavior in existing code.

Internals

Internal function pointers

The old code generator uses code offsets or tags for values of internal function pointers. This is especially complicated since these offsets are different at construction time and after deployment and the values can cross this border via storage. Because of that, both offsets are encoded at construction time into the same value (into different bytes).

In the new code generator, function pointers use internal IDs that are allocated in sequence. Since calls via jumps are not possible, calls through function pointers always have to use an internal dispatch function that uses the switch statement to select the right function.

The ID 0 is reserved for uninitialized function pointers which then cause a panic in the dispatch function when called.

In the old code generator, internal function pointers are initialized with a special function that always causes a panic. This causes a storage write at construction time for internal function pointers in storage.

Note

The compiler is free to omit internal functions that are never explicitly referenced by name. As a consequence, assigning to a function type variable in inline assembly does not guarantee that the assigned value will be included in the internal dispatch. The function must also be explicitly referenced elsewhere in the code.

Cleanup

The old code generator only performs cleanup before an operation whose result could be affected by the values of the dirty bits. The new code generator performs cleanup after any operation that can result in dirty bits. The hope is that the optimizer will be powerful enough to eliminate redundant cleanup operations.

For example:

open in Remix

// SPDX-License-Identifier: GPL-3.0 pragma solidity >=0.8.1; contract C { function f(uint8 a) public pure returns (uint r1, uint r2) { a = ~a; assembly { r1 := a } r2 = a; } }

The function f(1) returns the following values:

Note that, unlike the new code generator, the old code generator does not perform a cleanup after the bit-not assignment (a = ~a). This results in different values being assigned (within the inline assembly block) to return value r1 between the old and new code generators. However, both code generators perform a cleanup before the new value of a is assigned to r2.