[RFC] Optimizing Code Size of objc_direct by Exposing Function Symbols and Moving Nil Checks to Thunks (original) (raw)
Motivation
The initial implementation of __attribute__((objc_direct)) (hereafter objc_direct) was designed with an emphasis on ABI stability. While this design achieved its ABI goal, it has led to drawbacks, primarily related to code size and linkage. Currently, we have the following three main issues:
- Code Bloat and Poor Optimization: The nil-checking logic (and for class methods, class realization logic) is duplicated in every
objc_directmethod implementation, contributing to binary size. For instance methods, the callee always performs a nil check, even at non-null call sites (likeself), adding unnecessary code, - Poor Linkage: The hidden symbol (internal linkage) prevents developers from calling the direct method from other link units, forcing them to write manual wrapper thunks, further increasing code size,
- Complexity on Swift interop: We intend to implement
@objcDirecton the Swift side as well. But a lot of implementation difficulties will be placed on SILGen if a direct method needs to do nil check (e.g. the exposed API haveOptional<MyClass>type, and the Swift version of the nil check need to emit code to unwrap thatOptionalbefore calling the actual Swift function.
This proposal aims to change the implementation strategy and resolve the three issues above. By exposing the true implementation symbol and moving the responsibility of nil-checking to a caller-generated thunk. This way, we can reduce code size by eliminating unnecessary nil checks and make bridging to @objcDirect to Swift easier.
Design
The core of this design is to split the responsibilities for objc_direct calls. The callee will emit a public-facing, external implementation, and the caller will decide whether to call it directly or via a newly generated thunk. We plan to initially gate the feature by a new compiler flag -fobjc-direct-caller-thunks for experiment, before rolling it out to the default behavior.
This design differs slightly for instance methods and class methods.
Instance Methods
Current Design
A method -[MyClass myMethod] is emitted with a hidden symbol (e.g., @"\01-[MyClass myMethod]"). This function contains the self == nil check. All call sites call this hidden symbol, and the nil check is executed every time.
Proposed Design
The responsibility of performing nil check is split between the true implementation and a caller-side thunk.
1. True Implementation (Callee):
- Symbol: Emitted with its public, mangled name (e.g.,
@"-[MyClass myMethod]"). - Linkage:
external. - Logic: Contains only the method’s implementation. It performs no nil check.
2. Call Site (Caller): When Clang encounters [receiver myMethod]:
- Case 1: Receiver is Non-Null. If static analysis proves the receiver is non-null (e.g.,
self), Clang emits a direct call to the public symbol@"-[MyClass myMethod]". - Case 2: Receiver is Nullable. If the receiver may be nil, Clang emits a call to a caller-side thunk.
3. The Thunk (Generated by Caller):
- Symbol: Generated with a suffix (e.g.,
@"-[MyClass myMethod]_thunk"). - Linkage:
linkonce_odr, this is so that when multiple callers in different link units generate identical thunks, linker don’t complain. - Logic:
- Performs the
self == nilcheck. Ifnil, returns a zero-initialized value. - If non-nil, performs a
musttailcall to the true implementation (@"-[MyClass myMethod]").
- Performs the
This musttail call is critical for ARC correctness, as it makes the thunk “invisible” to the ARC contract.
Corner Case: Variadic Methods (va_arg)
Variadic methods are excluded from this change. This is because forwarding their arguments is fundamentally incompatible with the thunk’s design: our design requires a musttail call for ARC correctness, which forbids any stack management (like va_start/va_end) in the thunk.
To maintain 100% backward compatibility, objc_direct can still be attached to a variadic method, but it will have the old, hidden ABI (hidden \01 symbol with internal nil check).
Class Methods
Class methods introduce the separate problem of class realization.
Current Design
A method +[MyClass myMethod] is emitted with a hidden symbol (e.g., @"\01+[MyClass myMethod]"). This function performs:
- Class Realization: It calls
[self self]to ensure the class object is loaded. - Nil Check: If the class object is weakly linked, it also checks if the class object exists at runtime, i.e. nil check.
Proposed Design
We will follow a similar pattern, preserving the existing logic for when to nil-check.
1. True Implementation (Callee):
- Symbol: Emitted with its public, mangled name (e.g.,
@"+[MyClass myMethod]"). - Linkage: external
- Logic: The function does not have the nil check nor class realization. It contained only the method’s true implementation.
2. The Thunk (Generated by Caller). This logic is the same as instance methods. However, it will do a class realization before the nil check. Nil check is only carried out if we cannot reason if class object is non-null, which is only when the class is weakly linked (isWeakLinkedClass(OID))
3. Call Site (Caller): When Clang encounters a call to +[MyClass myMethod], the caller needs to reason if the class object can be null (isWeakLinkedClass(OID)), and whether the class has been realized. If both conditions are met, dispatch to the true implementation.
We need some static analysis heuristics to determine if a class object has been realized. Simple heuristics includes: If a call to a method in the same class is dominating the current call, the class object must have been realized by the previous call. Extra care needs to be applied here: even if call to [Parent foo] dominates call to [Child foo], the call to [Child foo] still needs to go through class realization to make sure Child is realized. While static types can be reasoned easily, when the type is id, things are not trivial.
Previous Approaches and Why This is Better
Previous attempts (like #126639) explored a “two-symbol” approach where the callee module would emit both the old hidden symbol (for ABI) and a new exposed symbol.
This “two-symbol” design is more complex:
- It requires the callee to emit two versions, bloating the callee module,
- The caller must be “smart” enough to know which of the two symbols to call,
- The swift frontend still needs to emit a thunk.
The design proposed here improves on it in these ways:
- Single Implementation Source: The callee only emits one function: the “true implementation” (with
linkonce_odr). This is simple and clean. - Caller-Side Generation: The nil-checking thunk is generated by the caller and only if needed. A module that only makes non-null calls will generate zero thunks, achieving maximum optimization.
- Better Code Size: We trade N (N = number of
objc_directmethods) duplicated nil-checks for M (M = number ofobjc_directmethods that are actually called nullably) thunks. Since these M are strictly smaller than N, this is a significant improvement in code size and efficiency. - Better Swift interoperability: With this patch, making a swift attribute
@objcDirecteasier to implement: existing thunk generated by@objccan be reused with little change.
cc @rjmccall @sharonxu @AdamCmiel
Edit: per discussion with @rjmccall , the linkage of the true implementation doesn’t need to be linkonce_odr, only the thunk needs to be linkonce_odr
Edit 2: Update class method’s thunk and dispatch logic after discussion with John