[RFC] Optimizing Code Size of objc_direct by Exposing Function Symbols and Moving Nil Checks to Thunks (original) (raw)

Motivation

The initial implementation of __attribute__((objc_direct)) (hereafter objc_direct) was designed with an emphasis on ABI stability. While this design achieved its ABI goal, it has led to drawbacks, primarily related to code size and linkage. Currently, we have the following three main issues:

  1. Code Bloat and Poor Optimization: The nil-checking logic (and for class methods, class realization logic) is duplicated in every objc_direct method implementation, contributing to binary size. For instance methods, the callee always performs a nil check, even at non-null call sites (like self), adding unnecessary code,
  2. Poor Linkage: The hidden symbol (internal linkage) prevents developers from calling the direct method from other link units, forcing them to write manual wrapper thunks, further increasing code size,
  3. Complexity on Swift interop: We intend to implement @objcDirect on the Swift side as well. But a lot of implementation difficulties will be placed on SILGen if a direct method needs to do nil check (e.g. the exposed API have Optional<MyClass> type, and the Swift version of the nil check need to emit code to unwrap that Optional before calling the actual Swift function.

This proposal aims to change the implementation strategy and resolve the three issues above. By exposing the true implementation symbol and moving the responsibility of nil-checking to a caller-generated thunk. This way, we can reduce code size by eliminating unnecessary nil checks and make bridging to @objcDirect to Swift easier.

Design

The core of this design is to split the responsibilities for objc_direct calls. The callee will emit a public-facing, external implementation, and the caller will decide whether to call it directly or via a newly generated thunk. We plan to initially gate the feature by a new compiler flag -fobjc-direct-caller-thunks for experiment, before rolling it out to the default behavior.

This design differs slightly for instance methods and class methods.

Instance Methods

Current Design

A method -[MyClass myMethod] is emitted with a hidden symbol (e.g., @"\01-[MyClass myMethod]"). This function contains the self == nil check. All call sites call this hidden symbol, and the nil check is executed every time.

Proposed Design

The responsibility of performing nil check is split between the true implementation and a caller-side thunk.

1. True Implementation (Callee):

2. Call Site (Caller): When Clang encounters [receiver myMethod]:

3. The Thunk (Generated by Caller):

This musttail call is critical for ARC correctness, as it makes the thunk “invisible” to the ARC contract.

Corner Case: Variadic Methods (va_arg)

Variadic methods are excluded from this change. This is because forwarding their arguments is fundamentally incompatible with the thunk’s design: our design requires a musttail call for ARC correctness, which forbids any stack management (like va_start/va_end) in the thunk.

To maintain 100% backward compatibility, objc_direct can still be attached to a variadic method, but it will have the old, hidden ABI (hidden \01 symbol with internal nil check).

Class Methods

Class methods introduce the separate problem of class realization.

Current Design

A method +[MyClass myMethod] is emitted with a hidden symbol (e.g., @"\01+[MyClass myMethod]"). This function performs:

  1. Class Realization: It calls [self self] to ensure the class object is loaded.
  2. Nil Check: If the class object is weakly linked, it also checks if the class object exists at runtime, i.e. nil check.

Proposed Design

We will follow a similar pattern, preserving the existing logic for when to nil-check.

1. True Implementation (Callee):

2. The Thunk (Generated by Caller). This logic is the same as instance methods. However, it will do a class realization before the nil check. Nil check is only carried out if we cannot reason if class object is non-null, which is only when the class is weakly linked (isWeakLinkedClass(OID))

3. Call Site (Caller): When Clang encounters a call to +[MyClass myMethod], the caller needs to reason if the class object can be null (isWeakLinkedClass(OID)), and whether the class has been realized. If both conditions are met, dispatch to the true implementation.

We need some static analysis heuristics to determine if a class object has been realized. Simple heuristics includes: If a call to a method in the same class is dominating the current call, the class object must have been realized by the previous call. Extra care needs to be applied here: even if call to [Parent foo] dominates call to [Child foo], the call to [Child foo] still needs to go through class realization to make sure Child is realized. While static types can be reasoned easily, when the type is id, things are not trivial.

Previous Approaches and Why This is Better

Previous attempts (like #126639) explored a “two-symbol” approach where the callee module would emit both the old hidden symbol (for ABI) and a new exposed symbol.

This “two-symbol” design is more complex:

The design proposed here improves on it in these ways:

cc @rjmccall @sharonxu @AdamCmiel

Edit: per discussion with @rjmccall , the linkage of the true implementation doesn’t need to be linkonce_odr, only the thunk needs to be linkonce_odr
Edit 2: Update class method’s thunk and dispatch logic after discussion with John